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Abstract 

Concurrent  logic  languages  are  high-level  programming  languages  for  parallel  and  distributed 
systems  that  offer  a  wide  range  of  both  known  and  novel  concurrent  programming  technique*.  Being 
logic  programming  languages,  they  preserve  many  advantages  of  the  abstract  logic  programming 
model  including  the  logical  reading  of  programs  and  computations,  the  convenience  of  representing 
datarstructures  with  logical  terms  and  manipulating  them  using  unification,  and  the  amenability 
to  meta-programming.  Operationally,  their  model  of  computation  consist*  of  a  dynamic  set  of 
concurrent  processes,  communicating  by  instantiating  shared  logical  variables,  synchronising  by 
waiting  for  variables  to  be  instantiated,  and  making  nondeterministic  choices,  possibly  based  on 

the  availability  of  values  of  variables. 

This  paper  survey*  the  family  of  concurrent  logic  programming  languages  within  a  uniform 
operational  framework.  It  demonstrates  the  expressive  power  of  even  the  simplest  language  in  the 
family,  and  investigates  how  varying  the  basic  synchronisation  and  control  constructs  affect  the 
expressiveness  and  efficiency  of  the  resulting  languages. 

In  addition,  the  paper  reports  on  techniques  for  sequential  and  parallel  implementation  of 
languages  in  this  family,  mentions  their  applications  to  date,  and  relates  these  languages  to  the 
abstract  logic  programming  model,  to  the  programming  language  Prolog,  and  to  other  concurrent 
computational  models  and  programming  language*,  f  |  ' O  )  C 
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PARTI  INTRODUCTION 


1.  Introduction 

In  surveying  concurrent  logic  programming  languages,  this  paper: 

•  Introduces  the  computational  models  of  logic  programs,  Prolog,  and  concurrent  logic  lan¬ 
guages. 

•  Discusses  the  different  role  of  nondeterminism  in  these  three  computational  models. 

•  Explains  the  use  of  the  logical  variable  as  a  communication  channel,  and  the  use  of  unification 
in  the  specification  and  implementation  of  sophisticated  communication  protocols. 

•  Demonstrates  the  powerful  programming  techniques  available  in  concurrent  logic  languages, 
including:  stream  processing,  the  formation  and  manipulation  of  dynamic  process  networks, 
incomplete- message  protocols  for  dialogues  and  network  configuration,  concurrent  construc¬ 
tion  of  shared  data-structures,  and  short-circuit  protocols  for  distributed  termination  and 
quiescence  detection. 

•  Demonstrates  the  utility  of  enhanced  meta-interpreters  in  concurrent  logic  programming,  in¬ 
cluding  their  application  to  computation  control,  to  the  formation  of  live  and  frozen  snapshots, 
and  to  computation  replay  and  debugging. 

•  Exposes  the  spectrum  of  concurrent  logic  programming  languages,  ranging  from  the  simpler 
and  weaker  ones,  to  the  more  complex  and  more  expressive  ones. 

•  Reports  on  implementation  techniques  for  sequential  and  parallel  computers  developed  for 
concurrent  logic  language,  as  well  as  on  specialized  architectures  designed  for  them. 

The  paper  does  not  aim  at  providing  a  historical  account  of  the  development  of  concurrent 
logic  languages.  Rather,  it  attempts  to  expose  the  core  concepts  of  these  languages,  as  well  as 
the  internal  structure  of  the  family  and  the  qualities  of  each  of  its  members,  within  a  consistent 
operational  framework.  As  a  result,  usually  an  idealized  or  a  simplified  version  of  each  language  is 
described.  When  applicable,  the  differences  from  the  actual  language,  as  well  as  relevant  historical 
facts,  are  noted1. 

The  paper  consists  of  five  parts.  In  the  remainder  of  Part  I,  Section  2  surveys  briefly  the 
abstract  computational  model  of  logic  programming  and  of  (pure)  Prolog,  explaining  the  role  of 
the  logical  variable,  unification,  and  nondeterminism  in  this  model. 

Part  II  conveys  the  core  concepts  and  techniques  of  concurrent  logic  programming.  Section  3 
introduces  the  basic  concepts  of  concurrent  logic  programming,  and  the  use  of  shared  logical 
variables  for  communication  and  synchronization.  Section  4  defines  a  simple  concurrent  logic 
language.  This  language  is  used  in  Section  5  to  illustrate  basic  concurrent  logic  programming 
examples  and  techniques.  Section  6  discusses  fairness  conditions  for  concurrent  logic  programs. 
Following  that,  Section  7  describes  advanced  concurrent  logic  programming  techniques.  Although 
this  part  uses  a  particular  concurrent  logic  language,  both  the  basic  and  advanced  techniques 
shown  are  common  to  most  programming  languages  in  the  family;  exceptions  are  noted  when  each 
language  is  introduced. 

Part  III  surveys  the  various  members  of  the  family  of  concurrent  logic  languages.  Section  21 
describes  our  method  of  comparing  languages  in  the  family.  We  compare  languages  for  their  expres¬ 
siveness,  simplicity,  readability  and  efficiency.  In  comparing  expressiveness,  we  explore  embeddings 
among  languages  and  the  programming  techniques  provided  by  each  language. 

Section  9  discusses  the  semantics  of  concurrent  logic  programs.  Sections  10  to  13  introduce 
and  compare  flat  concurrent  logic  languages.  A  flat  language  is  defined  with  respect  to  a  given  fixed 
set  of  primitive  predicates  (in  the  languages  discussed  these  include  mainly  equality,  inequality  and 
arithmetic  tests).  In  a  flat  language  a  process  can  perform  only  a  simple  computation,  specified 


1  For  additional  historical  notes  see  (145,164]. 
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by  a  conjunction  of  atoms  with  primitive  predicates,  before  making  a  committed  nondeterminiatic 
choice.  In  non-flat  languages  such  pre-commit  computations  may  involve  program-defined  predi¬ 
cates,  thus  can  be  arbitrarily  complex.  During  a  computation  of  a  non-flat  language  the  processes 
form  an  And/Or-tree,  whereas  in  a  flat  language  the  processes  are  a  “flat”  collection;  hence  their 
name.  Non-flat  concurrent  logic  languages  are  surveyed  in  Section  18. 

Part  IV  describes  implementations  developed  for  concurrent  logic  languages,  and  references 
their  applications.  Implementation  techniques  for  both  sequential  and  parallel  computers  are 
reviewed,  as  well  as  specialised  architectures  designed  for  their  efficient  execution. 

Part  V  concludes  the  paper  by  comparing  the  concurrent  logic  programming  model  with  other 
approaches  to  programming  and  modeling  concurrency,  including  Prolog,  dataflow  languages,  func¬ 
tional  languages,  message-passing  models  of  concurrency,  object-oriented  languages,  and  nonde- 
terministic  transition  systems. 

How  to  read  the  paper 

The  reader  who  wishes  only  to  understand  a  single  concurrent  logic  language  can  skim  Part  I  and 
read  Part  II.  There  are  sufficient  intuitive  explanations  and  examples  so  that  the  formal  treatment 
of  the  semantics  of  logic  programs  can  be  skipped  without  lass  of  continuity.  The  reader  interested 
in  implementation  techniques  can  read  Section  20  of  Part  IV  without  reading  Part  111. 


2.  Logic  Programming,  Prolog,  and  the  Power  of  the  Logical  Variable 

This  section  introduced  the  logic  programming  computational  model.  It  defines  pure  Prolog  and 
relates  it  to  the  logic  programming  model.  It  discusses  properties  of  the  logical  variable  and 
unification  and  their  relation  to  conventional  data-manipulation  operations. 

2.1  Syntax  and  informal  semantics  of  logic  programs 

We  use  the  Edinburgh  syntax  [11]  for  logical  variables,  terms,  and  predicates. 

Definitions:  Term,  atom,  clause,  logic  program,  vocabulary. 

s  A  term  is  a  variable  (e.g.  X)  or  a  function  symbol  of  arity  »>0,  applied  to  n  terms  (e.g.  c 
and /(.,*,, (6,  V))). 

s  An  atom  is  a  formula  of  the  form  y(  7| .  .,Tn),  where  p  is  a  predicate  of  arity  n  and  7j . 7'n 

are  terms. 

s  A  definite  c/a  use  {clause  for  short)  is  a  formula  of  the  form: 

A  «—  fli, a  >  0. 

where  A  is  an  atom  By,. Bn  is  s  sequence  of  atoms.  A  is  called  the  clause’s  head,  and 
By,..  ,,Bn  its  body  We  denote  the  empty  sequence  of  atoms  by  frse. 

•  A  topic  program  is  a  finite  set  of  definite  clauses. 

•  A  goal  is  a  sequence  ot  atoms  Ay,Aj,. .  .,A„.  A  goal  is  cmpip  if  n--0.  atomic  if  n=l,  and 
coayaacfive  if  n>l.  Each  atom  in  a  goal  is  called  a  goal  atom.  A  goal  atom  is  often  called 
alio  a  goal  for  short. 

•  The  «oca4ufsry  of  a  logic  program  P  is  the  set  of  predicates  and  function  symbols  that  occur 
in  the  clauses  of  P.  | 

We  use  the  Edinburgh  notation  for  lists  (and  also  for  streams,  as  discussed  below).  The  term 
|AT#J  (read  “X  cons  X »*)  is  a  list  whose  head  is  X  and  tail  is  Xs,  and  the  constant  ( }  (read 
“nil*)  is  used  by  convention  to  denote  the  empty  list. 

Informal  semantics  of  logic  programs 

Logic  programs  can  be  read  both  declarativeiy  and  operationally.  We  describe  these  two  views 
here  informally,  and  make  them  precise  in  Section  2.4  below. 
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Declaratively,  each  clause  in  &  logic  program  is  read  as  a  universally  quantified  implication. 
If  Xi,X2>. .  .,Xn  are  the  variables  in  the  clause  A  ♦—  then  the  clause  is  read  “for  all 

.  -,Xn,  A  is  true  if  B\  and  B2  and  . . .  and  B *  are  true”.  A  logic  program  is  read  as  the 
conjunction  of  the  universal  implications  corresponding  to  its  clauses. 

Operationally,  logic  programs  can  be  viewed  as  an  abstract  computational  model,  like  the 
Turing  Machine,  the  Lambda  Calculus,  and  the  Random  Access  Machine.  A  computation  in  this 
model  is  a  goal-driven  deduction  from  the  clauses  of  the  program.  Like  the  nondeterministic  Turing 
machine,  computations  in  this  model  are  nondeterministic:  from  each  state  of  the  computation 
there  may  be  several  possible  transitions.  Specifically,  the  clauses  of  a  logic  program  can  be  read 
as  transition  rules  of  a  nondeterministic  transition  system. 

The  state  of  a  computation  consists  of  a  goal  (sequence  of  atoms)  G  and  a  substitution 
(assignment  of  values  to  variables)  0,  and  is  denoted  by  a  pair  (G\9).  A  computation  begins 
with  an  initial  state  consisting  of  the  initial  goal  to  be  proven  and  the  empty  substitution  e, 
and  progresses  nondeterministically  from  state  to  state  according  to  the  following  transition  rules, 
Reduce  and  Fail.  A  computation  can  be  viewed  as  an  attempt  to  prove  the  initial  goal  from 
the  program.  At  each  state  the  goal  represents  a  statement  whose  proof  will  establish  the  initial 
goal;  the  substitution  represents  the  values  computed  so  far  for  variables  used  in  the  computation, 
including  the  initial  goad  variables.  A  computation  ends  in  a  state  whose  goal  is  either  true  or 
fail.  In  the  former  case  the  computation  is  successful,  and  it  corresponds  to  a  successful  proof  of 
the  initial  goal.  In  the  latter  it  is  failed.  The  substitution  in  the  terminal  state,  restricted  to  the 
variables  in  the  initial  goal,  is  called  the  answer  substitution  of  the  computation. 

A  successful  computation  has  the  property  that  its  initial  goal,  instantiated  by  the  answer 
substitution,  is  a  logical  consequence  of  the  program. 

A  key  step  in  the  transitions  is  the  unification  of  a  goal  atom  with  the  head  of  a  clause. 
Intuitively,  a  unifier  of  two  terms  Ti  and  T2  is  a  substitution  0 ,  whose  application  to  T\  and  T2 
yields  the  same  term,  i.e.  Ti0=TiO-  The  unification  of  two  terms  Ti  and  T2  returns  their  most 
general  ( “simplest” )  unifier  0  if  there  is  one,  or  fail  if  there  is  none.  The  two  cases  are  denoted 
by  mgu(Ti ,Ti)=z0  and  mgu( T\ , T2)=/W,  respectively.  For  example,  the  most  general  unifier  of 
f(X,b)  and  f(g(Y),Z)  is  the  substitution  {X*—g(  Y),Z*—b).  Examples  of  other  (less  general) 
unifiers  are  {X»-*0(a),2»-*6},  {X*-*g(b),Z>-*b},  and  {X*-*j(^(  W)),Z*-+b). 

We  denote  the  ability  to  move  from  a  state  S  to  a  state  S'  using  a  transition  rule  t  by 
Substitutions  can  be  viewed  as  functions  from  variables  to  values  (see  Section  2.4),  hence  we  use 
9  o  O'  to  denote  the  substitution  whose  application  has  the  effect  of  applying  0  then  applying 
O'.  The  Reduce  and  Fail  transition  rules  require  that  the  variables  in  the  clause  be  consistently 
replaced  by  new  variables  that  have  not  been  used  before  in  the  computation.  A  clause  to  which 
this  replacement  has  been  applied  is  called  renamed  apart.  The  requirement  to  rename  a  clause  is 
inherited  from  the  resolution  rule,  and  ensures  that  clauses  are  “re-entrant” . 

There  are  two  transition  rules: 

1.  Reduce 

<M, . A . .  -2^  «*l . B . . . AJV-floV) 

If  mgu{A{,A)  =  O'  for  some  renamed  apart  clause  A  *—  B\,. .  of  P. 

2.  Fail 

Fail 

{Ai . Ai,...,An;9)  - *  (Jail-,0) 

If  for  some  *  and  for  every  renamed  apart  clause  A  ♦—  B\ ,. .  .,/?*  of  P,  mj«(>4i,/l)  =  fail. 

Reduce  has  the  following  property:  If  (G,0)  - -  (G'yBoO*),  with  mgu  0',  then  GO'  is  a 

logical  consequence  of  the  program  and  G'.  This  implies,  by  induction,  that  the  initial  goal,  to 
which  the  answer  substitution  of  a  successful  computation  is  applied,  is  a  logical  consequence  of 
the  program. 

Note  that  there  are  two  types  of  nondeterministic  choices  in  the  Reduce  transition:  which  goal 
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atom  to  reduce,  and  which  clause  to  reduce  it  with.  The  first  is  called  And- non  determinism,  the 
second  Or-nondetermtniam .  Fail  has  only  an  And-nondeterministic  choice. 

A  computation  progresses  until  it  reaches  a  terminal  state,  which  is  a  state  to  which  no 
transition  applies.  By  the  definition  of  Reduce  and  Fail,  the  goal  in  a  terminal  state  is  either  true 
or  fail. 


2.2  Examples  of  logic  programs  and  their  computations 

We  show  some  simple  logic  programs  and  illustrate  their  operational  behavior.  The  following  logic 
program  defines  the  predicate  8um(Xs,S),  which  holds  if  S  is  the  sum  of  the  elements  of  the  list 
Xa. 

sum(Xs,S)  *— 
sum'(Xs,0,S). 

sum'([  ],S,S). 

sum'([X|Xs],P,S)  * 
plus(X,P,P') 
sum'(Xs,P',S) 

The  program  uses  an  auxiliary  predicate  aum'(AiS,P,S),  which  holds  if  the  sum  of  the  Xs  plus  P  is 
5,  and  the  predicate  phu(X,Y,Z),  which  holds  if  X  plus  Y  is  Z.  For  the  purpose  of  this  example 
we  assume  that  p las  is  defined  by  a  large  set  of  facts,  including: 


plus(0,0l0).  %4 

plus(0,!,r). 

plus(l,0,l).  %6 

plus(2,0,2).  .. 


%1 

%2 

%3 


To  increase  readability  of  the  following  examples  of  computations  we  annotate  Reduce  tran¬ 
sition  with  a  label  (1,7)  identifying  the  indices  of  goal  atom  and  program  clause  that  were  used  for 
reduction,  Fail  transitions  with  the  index  of  the  failing  goal  atom,  and  restrict  the  substitution  in 
a  state  to  the  initial  goal  variables. 

An  example  of  a  successful  computation  of  the  above  program  is: 

Rcduc«(l,l) 

(sum([l,2],S);  e)  - > 


(sum'([l,2],0,S);  e) 


Reduc«(l,3) 


(plua(l,0,P),  sum'([2],P,S);  e) 

Reduc«(l,3) 

(sum'([2],l,S);  c) - • 

(plus(2,l,P'),sum'([],P',S);t> 

fUduc'O.3) 

(sum([  ],3,S);  t) - U 

(free;  {S->3}) 

An  example  of  a  failing  computation  is: 

■Macao, I) 
(sum([l,2],S);  e)  - 1 


Rcduc^l.S) 


R«duce<l,fi) 


-  h  - 


(8um/([l,2],0,S);  e) 


n«duc«(ira) 


(plus(l,0,P),  tum^pl.P.S);  r) 


R«duc«(3,3) 


(plua(l,0,P),  plus(2)PlP'),  Min'd  JjP'jS);  e) 

P»a(i) 

(plus(l,0,2),  8um'([  ],4,S);  t)  - > 

(/«./,  r> 


R«duee<3,9) 


The  failure  in  the  last  computation  could  have  been  avoided  by  deferring  the  reduction  of  the 
goal  atom  until  more  information  was  available.  The  Reduce  transition  of  Prolog, 

introduced  in  Section  2.5  below,  always  chooses  the  leftmost  atom  in  the  goal  for  reduction.  Thus  a 
Prolog  computation  on  an  initial  goal  s«m(Xs,S)  whose  first  argument  is  a  complete  list3 of  integers 
and  the  second  argument  is  a  variable  is  bound  to  succeed.  Furthermore,  such  a  computation  is 
deterministic,  in  the  sense  that  at  each  step  only  one  clause  head  unifies  with  the  selected  goal 
atom.  Concurrent  logic  languages  use  other  mechanisms  to  delay  the  reduction  of  a  goal  atom, 
which  do  not  impose  such  strict  sequentiality. 

The  following  logic  program  defines  the  relation  in.boih(X,Ll,Lt ),  which  holds  if  X  is  a 
member  of  both  lists  LI  and  Li.  It  uses  the  auxiliary  predicate  mem&er(X,Xs),  which  holds  if  X 
is  a  member  of  the  list  Xa. 


%  m.both( X, L\,Li)  •—  X  is  a  member  of  both  lists  L\  and  £2- 

in_both(X,Ll,L2)  -  %1 

member(X,Ll),  member(X,L2). 

%  membtr{X,Xs)  «—  X  is  a  member  of  the  list  Xs. 

member(X,[X|Xs]).  %2 

member(X,[Xl|Xs])  -  %3 

member(X,Xs). 

Here  are  two  possible  computations  from  the  goal  in_6olA(X,[a,6],[6,c]).  A  failing  computation,  in 
which  X  is  chosen  to  be  a,  and  the  computation  of  the  remaining  goal  mem6er(a,[f.cj)  fails: 


(in_6olA(X,[a,i],[&,c])  e) 


Reduce(l,l) 


(mem6er(X,[a,4])t  mem6cr(X,[&,c])  ;  e) 
Reduce{  1  ,3) 

(mem6er(a,[6,c])  ;  {X*-»a})  ■■  » 


Aedur«(l.?) 


( member(a,[c ])  ; 


Reduce(l,3) 


Fail(l) 

(mem6cr(a,[  ])  ;  {X>-*a}) - ► 

{ Jail  ;  {X~o}) 

A  successful  computation  from  the  same  goal,  in  which  X  is  chosen  to  be  b: 

ReduceO.l) 


{tn_&oM(X,[<j,&],[6,c]) ;  c) 
(mtmbcr{X,[a,b\),  member(X,[b,c])  ;  r) 


Reduce^  l  ,3) 


3  A  list  is  complete  if  every  instance  of  it  is  s  list  [171];  [s,  b\,  [X,  6],  and  [X,  V]  are  complete  lists,  and  [a,  6 1  As], 
(fl J  As],  [A  [As]  snd  As  are  examples  of  incomplete  lists  [171). 
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(memfcer(X,[fc]),  member(X,lfc,e])  ;  c) 
Raducs(l,2) 


fUdueeO.2) 


(mem6cr(6,[6#c])  ;  {X>- 
(<mc  ;  {X»-*&}) 


*M)' 


For  this  program,  no  ordering  of  goal  atoms  can  make  the  compulation  deterministic  on  an 
initial  goal  atom  whose  first  argument  is  a  variable. 

The  following  logic  program  uses  the  difference- list  technique  for  efficient  list  concatenation,  so 
we  digress  to  explain  it.  A  difference-list  is  a  term  representing  a  list  as  the  difference  between  two 
(possibly  incomplete)  lists.  By  convention,  the  term  B\T  is  used,  where  ‘\*  is  a  binary  function 
symbol  written  in  infix  notation;  H  is  called  the  head  and  T  the  tail  of  the  difference-list.  Examples 
of  difference-lists  representing  the  list  [a,l,c]  are  [s,>.c]\[],  [a, 6, cf e] ,\[^, «] ,  and  [M,e|Xs]\Xs. 
Given  two  difference-lists  Hj\Ti  and  Hq\T2,  if  Ti=ff2  then  ffi\Tj  is  their  concatenation.  It  is  easy 
to  see  that  the  list  represented  by  B\\Tt  is  the  concatenation  of  the  lists  represented  by  Bi\Ti 
and  Fj\7y  For  example,  the  concatenation  of  [a, b, c, e]  and  [<f, e]\[e)  is  [a,6,c,<f,e]\[e]. 
Operationally,  the  precondition  for  difference-list  concatenation,  i.e.  T\  —If 2 >  is  usually  met  by 
keeping  T\  a  variable.  For  example,  [s,ft,c|Xs]\Xs  can  be  concatenated  to  any  difference-list.  Its 
concatenation  with  [d,e|Vs]\  Ys  gives  [a,b, c,d,e\  YsJ\  Vs,  which  can  be  further  concatenated  to  any 
list. 

Difference-lists  are  the  preferred  representation  of  lists  when  concatenation  is  required.  Pro¬ 
grams  that  use  difference- lists  do  not  require  explicit  list  concatenation  using  a  predicate  like 
append,  and  are  thus  more  efficient  both  in  time  and  in  space.  Operationally,  they  achieve  an 
effect  similar  to  that  of  rplcd  in  Lisp,  but  without  destructive  data- manipulation  operations.  How¬ 
ever,  the  precondition  for  difference-list  concatenation,  i.e.  Ti= B2,  cannot  always  be  met,  for 
example  when  the  same  list  needs  to  be  concatenated  to  several  lists. 

The  third  example  is  a  recursive  program  for  flattening  a  tree.  It  operates  on  trees  constructed 
recursively  from  trte{L,R)  and  le«/(X),  where  L  and  R  are  recursively  trees,  and  X  is  the  value 
at  a  leaf.  The  predicate  flatUn(T,X$)  holds  if  Xs  is  the  list  of  values  at  the  leaves  of  the  tree 
T,  ordered  from  left  to  right.  The  predicate  /fafte*»'(r,X*\  Ys)  holds  if  the  difference-list  Xs\Ys 
represents  the  list  thus  defined. 


flatten(T.Xs)  —  %l 

flatten'(T,Xs\[  ]). 

flatten'(leaf(X),(X|Xs]\Xs).  %2 

flatten'(tree(L,R) ,Xs\Zs)  -  %Z 

flatten'(L,Xs\Ys), 
flatten'(R,Ys\Za). 

The  program  employs  several  standard  difference-list  cliche’s.  The  call  from  flatten  to  fl atten'  in 
Clause  1  employs  the  standard  translation  between  lists  and  difference- lists:  if  H\T  represents  the 
list  L  and  T=[  ]  then  H=L.  flatten '  returns  a  singleton  difference- list  in  Clause  2,  and  implicitly 
concatenates  the  difference-lists  representing  the  leaves  of  the  subtrees  by  calling  the  tail  of  the 
first  and  the  head  of  the  second  with  the  same  name,  Y»,  in  Clause  3. 

The  program  has  only  deterministic  and  succe«ful  computations  on  initial  goals  flatte »( T.Xs). 
where  T  is  a  complete  tree  and  X*  is  a  variable.  For  example: 


(flatten(tree(leaf(a),tree(leaf(b),leaf(c))),Xs);  e) 


Rcduc«0.1) 


(flatten' ( tr«e(leaf( a)  ,ttee(le»f(b)  ,W>af(c))) ,X»\[  ]);  t) 


R*duw(l,3) 


(flatten'(le»f(»),X«\Y»),  flatten'(tree(l«»f(b),le«f(e))),  Y.\(  ]);  c) 


fUdu 
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(flatten'(tree(leaf(b),Ieaf(e))),Y*\[  ]);  {X»->[a|Y»]}) 


R*duce(l,3) 


R*duce(l  ,2) 


(flat ten' (leaf( b ) ,  Ys\ Ya') ,  flalten'(Ieafi(c),Ys'\[  ]);  {X»-.[a|Y«J}) 

Reduce(l,2) 

(flatten'(leaf(c) , Ys/\[  ]);  {Xa»-[a>b|Ys']}) - * 

(true;  {Xs»->[a,b,c]}) 

The  other  three  possible  computations  on  the  same  initial  goal  would  also  be  deterministic  and 
yield  the  same  answer  substitution. 


2.3  The  operational  view  of  the  logical  variable  and  unification 

The  main  difference  between  logic  programming  and  other  computational  models  is  the  logical 
variable  and  its  manipulation  via  unification. 

The  basic  data-manipulation  operation  in  logic  programs  —  unification  —  results  in  a  substi¬ 
tution.  Operationally,  &  substitution  can  be  thought  of  as  a  simultaneous  assignment  of  values  to 
variables,  except  that  here: 

•  a  variable  can  be  assigned  a  value  only  once,  and 

•  the  value  assigned  can  be  itself  another  variable  or  a  term  containing  variables. 

The  single-assignment  property,  the  ability  to  assign  one  variable  to  another,  and  the  ability 
to  assign  a  term  containing  variables  to  a  variable  are  all  fundamental  to  logic  programming,  and 
are  the  source  of  many  powerful  logic  programming  techniques. 

Since  the  basic  computational  step  of  a  logic  program  requires  the  unification  of  a  goal  atom 
with  the  head  of  a  clause,  much  of  the  effort  in  logic  programming  has  been  devoted  to  under¬ 
standing  both  the  implications  of  this  operation  and  its  efficient  implementation.  This  study  has 
led  to  the  realization  that  many  of  the  subcases  of  goal/clause  unification  correspond  quite  closely 
to  basic  data  manipulation  operations  of  conventional  languages.  For  the  logic  programmer,  this 
implies  that  these  special  cases  can  be  used  to  achieve  the  effect  of  conventional  data  manipulation. 
For  the  logic  programming  language  implementor  this  implies  that  unification  can  be  implemented 
efficiently  by  compiling  the  special  cases,  when  identifiable,  into  machine  instructions  that  execute 
the  more  basic  data  manipulation  operations. 

The  correspondence  is  illustrated  in  Figure  1.  The  left  column  enumerates  basic  data- 
manipulation  operations  of  conventional  languages  such  as  Pascal  or  Lisp,  with  sample  code  frag¬ 
ments.  The  right  column  shows  the  corresponding  special  cases  of  unification,  with  the  correspond¬ 
ing  examples  of  goal  and  clause  terms.  In  this  figure  T  =  T*  denotes  the  unification  of  the  goal 
atom  term  T  with  the  clause  head  term  T'. 

Note  that  the  cases  in  the  figure  are  not  necessarily  mutually  exclusive.  For  example,  the 
unification  of  a  goal  variable  with  a  clause  term  both  constructs  the  term  and  assigns  it  to  the  goal 
variable;  the  unification  erf  a  goal  term  with  an  incomplete  clause  term  both  tests  for  equality  and 
performs  da*  ±  access. 


2.4  Semantics  of  logic  programs 

We  provide  here  definitions  for  some  of  the  concepts  used  intuitively  above. 
ilBifigfliigB 

A  substitution  is  a  function  from  variables  to  terms  which  is  different  from  the  identity  func¬ 
tion  on  a  finite  number  of  variables.  A  substitution  9  is  presented  as  the  finite  set  of  pairs 
.  .,Xnt—Tn),  where  X\,...,Xn  are  the  variables  where  0  is  different  from  the  identity 
function,  and  Ti=0(X{)%  i=/,..  .,n. 

For  any  term  T  and  substitution  0,  T9  denotes  the  term  obtained  by  replacing  every  variable 
X  in  T  by  9(X).  A  term  T  is  an  instance  of  a  term  T*  if  T=Tf9  for  some  substitution  9.  For 


Conventional  data  manipulation 
operation 

The  corresponding  special  case 
of  goal/clause  unification 

[Single-)  Assignment 

Variable  —  Non-variable 

X  =  a 

Equality  testing 

Term  =  Term 

s  =  4? 

s  =  a 

Data  access 

Compound  term  =  Incomplete  compound  term 

(e.g.  car  and  edr  in  Lisp,  V  in  Pascal) 

X  :=  car([«,6,cj),  Xs  :=  cdr([a,b,c]) 

[«.M  =  [x\x,] 

Data  construction 

Variable  =  Compound  term 

(e.g.  cons  in  Lisp,  new  in  Pascal) 

Vs  :=  coaj(s.XY) 

Vi  =  [«|Xs] 

Parameter  passing  by  value 

Goal  Term  =  Variable 

/(«)  =  x 

Parameter  passing  by  reference 

Variable  =  Variable 

X  =  Y 

No  corresponding  operation; 

Two  variables  =  Same  variable 

similar  to  aliasing 

(Y,Z)  =  (X,X) 

Figure  1 :  Buie  data  manipulation  operations  and  the  corresponding 
special  cases  in  goal/clause  unification 


example.  /(*,«),  f(X,X),  /(s.s),  /(s,»),  /(f  (£),*(*))  are  all  instances  of  f(X,Y). 

A  substitution  0  is  more  y enernl  than  V  if  there  is  a  substitution  o  such  that  0  =  V  o  a, 
where  o  denotes  function  composition.  An  equivalent  condition  is  that  TV  is  an  instance  of  T0  for 
any  term  T.  For  example,  is  more  general  than  Y>-.a}  and  {X*->f(Z)}  is  more 

general  than  (X*-*/(a)} 

A  substitution  0  is  a  aatjSer  of  two  terms  T\  and  Tj  if  T\0=Ti0.  For  example,  the  sub¬ 
stitution  {Xi-»e,>V-»/(s),Zi-.t}  is  a  unifier  of  p(X,i)  and  p(f(  Y),Z),  and  so  is  the  substitution 
(X~f(Y),Z>->t>). 

A  substitution  0  is  a  most  general  unifier  (mgs)  of  T\  and  Tj  if  it  is  a  unifier  of  T)  and  Tj 
and  is  more  general  than  any  other  unifier  of  Ti  and  T? . 

In  the  previous  example  the  second  unifier  is  the  most  general  one.  The  most  general  uni¬ 
fier  of  m(X,(X|Xs])and  m(X',[«,t,c])  is  [X>-»s,X'>-.a,Xai-»[&,c]},  and  the  most  genera)  unifier  of 
s((X|Xe],y»,[X|Z*])  and  s([s,»,c],[d,e],Zj')  is  {X~«,Xr>-.[t,e],  Y»~[d,e},Zt'^[a\Z>]}. 

In  the  previous  examples  there  wu  one  most  general  unifier.  The  two  terms  /(X)  and  f(Y) 
have  two  most  general  unifiers,  (X>-»Y'}  and  { Y  l-»  X  } . 

A  renaming  is  a  substitution  that  permutes  its  domain.  An  example  is  {X>->  Y,  Yt—X}.  It 
can  be  shown  that  all  most  general  unifiers  are  equivalent  up  to  renaming,  i.e.  if  0  and  0“  are  two 
most  general  unifiers  of  some  terms  than  there  is  a  renaming  p  such  that  8  —  0'  o  p.  In  addition, 
it  can  be  shown  that  if  two  terms  have  a  most  general  unifier,  then  they  have  an  idempotent  most 
general  unifier,  i.e.  an  mgu  0  for  which  0  =  0  o  0. 

We  define  a  function  mgu,  which  takes  two  terms  and  returns  their  set  of  idempotent  most 
general  unifiers,  if  there  are  any,  and  full  if  there  are  none.  Usually  we  do  not  care  which  mgu  is 
employed;  in  such  cases  we  write  my«(7i,7])  =  0  instead  of  0  €  mys(7|,7]). 

For  a  detailed  analysis  of  unification  see  [108].  The  operational  intuitions  behind  unification 
were  elaborated  in  Section  2.3  above. 
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A  transition  system  for  logic  program* 

Transitions  systems  will  be  employed  throughout  this  paper.  We  specify  a  transition  system  for 
logic  programs,  as  well  as  general  notions  that  will  be  used  in  subsequent  transition  systems  for 
concurrent  logic  programs.  The  general  style  of  the  transition  system  is  that  of  Pnueli  [142]*,  the 
details  are  adapted  from  Gerth  et  at.  [65]. 

Definition:  Transition  system  for  a  logic  program  P. 

We  associate  with  every  logic  program  P  a  transition  system  which  consists  of: 

•  A  set  of  states. 

A  state  is  a  pair  (G;0),  where  G  (the  goal)  is  either  a  sequence  of  atoms  or  fail ,  and  0  is  a 
substitution. 

•  A  set  of  transitions. 

A  transition  is  a  function  from  states  to  sets  of  states.  For  states  S,  S'  and  transition  t,  we 
denote  that  S'  6  1(5)  by  S  S'.  The  set  includes  the  Reduce  and  Fail  transitions  defined  in 
Section  2.1  above.  | 

Definition:  Enabled  transition,  terminal  state,  success  state,  failure  state. 

•  A  transition  t  is  enabled  on  a  state  5  if  1(5)  is  non-empty. 

•  A  state  on  which  no  transition  is  enabled  is  called  a  terminal  state.  A  terminal  state  of  the 
form  ( truefi )  is  called  a  success  state ,  and  (Jaii,0)  a  failure  state,  fl 

Definition:  Computation 

A  computation  of  a  program  P  on  a  goal  G  is  a  (finite  or  infinite)  sequence  of  states 

e  =  S\t  &i,  .. . 

satisfying: 

•  Initiation:  S*  —  {G\  c),  where  c  is  the  empty  substitution. 

•  Consecution:  For  each  k,  S*+j  €  1(5*)  for  some  transition  f. 

•  Termination:  c  is  finite  and  of  length  k  only  if  5*  is  terminal.  | 

Definition:  Partial  computation,  partial  answer  substitution. 

Any  prefix  of  a  computation  is  called  a  partial  computation.  The  partial  answer  substitution  of  the 
partial  computation  (G,e),. .  0)  is  0  restricted  to  variables  of  G.  | 

Soundness  and  completeness  of  the  transition  system 

A  rule  that  governs  the  And-nondeterminisiic  choices,  i.e.  the  choice  which  goal  atom  to  reduce 
next,  is  called  a  computation  rule  [121].  Formally,  it  is  a  function  from  a  goal  to  one  of  its 
constituent  atoms.  A  computation  obeys  a  computation  rule  if  the  goal  atom  selected  at  each 
transition  is  the  one  specified  by  the  rule. 

Theorem:  Independence  of  the  computation  rule  [19,121]. 

Let  P  be  a  program  and  R  a  computation  rule.  If  P  has  a  successful  computation  on  a  goal  G 
with  answer  substitution  0 ,  then  it  has  a  successful  computation  on  G  with  answer  substitution  0 
that  obeys  R. 

The  transition  system  for  logic  programs  realizes,  in  effect,  a  proof  procedure  for  logic  pro¬ 
grams.  Each  Reduce  transition  is  actually  an  application  of  an  inference  rule,  called  SLD-resolution 
[80,121],  which  is  a  special'case  of  Robinson's  resolution  inference  rule  [147].  SLD-resolution,  and 
hence  the  transition  system,  have  soundness  and  completeness  properties  that  link  their  operational 
view  to  the  logical  view  of  of  logic  programs  [121]. 

Notation:  If  A  is  an  atom  or  a  clause  with  variables  X\  .X?,. .  .,Xn,  (V)A  denotes  (V 
Xi,X2,-  ,Xn)A.  If  P  is  a  program  with  clauses  Ci,Cyt...,Cn  then  (V)P  is  the  conjunction 

(V)c,A(V)cyv-.-A  (V)C„. 

Theorem:  Soundness  and  completeness  of  SLD-resolution  [80,19,121]. 
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Let  P  be  a  program  and  A  an  atom. 

1.  (Soundneea):  If  P  has  a  computation  on  the  initial  goal  A  with  answer  substitution  t,  then 
(Y)At  is  a  logical  consequence  of  (V)P. 

2.  (Completeness):  If  (V)A'  is  a  logical  consequence  of  (V)P,  where  A'  ia  an  instance  of  the 
atom  A,  then  there  ia  a  computation  of  P  on  the  initial  goal  A  with  answer  substitution  t, 
such  that  A'  is  an  instance  of  At.  | 

Note  that,  in  particular,  if  (V)A  is  a  logical  consequence  of  (V)P,  then  there  is  a  computation 
of  P  from  A  with  answer  substitution  B  such  that  At  is  equal  to  A  up  to  renaming. 

The  soundness  theorem  relates  a  successful  computation  with  a  proof  of  a  goal.  Given  a 
program  P,  let  Si  S3  denote  that  there  is  partial  computation  of  P  leading  from  Si  to  Sj.  A 
partial  computation  from  a  unit  goal  can  be  viewed  as  a  proof  of  a  clause,  whose  head  is  the  initial 
goal,  instantiated  by  the  partial  answer  substitution,  and  its  body  is  the  remaining  foal,  as  shown 
by  the  following  lemma: 

Lemma:  If  (G;e)  then  (V)( G6  —  R)  is  a  logical  consequence  of  (V)P.  | 

Hence  every  partial  answer  substitution  can  be  thought  of  as  a  conditional  answer  to  the  query, 
whose  condition  is  the  yet-to-be-proved  goal  [144,205], 

Program  equivalence  and  observables 

For  simplicity,  we  assume  the  existence  of  some  global  vocabulary  V ,  in  which  all  programs  and 
goals  are  written  in. 

A  fundamental  question  in  programming  language  semantics  is  when  should  two  programs  be 
considered  equivalent.  For  example,  correctness  of  program  transformation  can  be  studied  only 
with  respect  such  a  notion  of  equivalence. 

Usually,  program  equivalence  is  defined  by  assigning  to  each  program  a  mathematical  object, 
called  its  meaning,  and  defining  two  programs  to  be  equivalent  if  they  have  the  same  meaning. 

The  meaning  of  a  program  is  usually  some  abstraction  of  its  possible  computations.  What  is 
abstracted  away  and  what  is  kept  is,  to  some  degree,  arbitrary,  and  depends  on  what  we  wish  to 
identify  as  the  observable  result  of  a  computation.  Hence  the  meaning  of  a  program  is  sometimes 
referred  to  as  its  observable  behavior,  or,  in  case  it  is  a  set,  as  its  observables  for  short. 

In  the  case  of  logic  programs  there  are  several  possible  notions  of  equivalence.  One  considers 
luccessful  computations.  Define  the  success  set  of  a  program  P  to  be  the  set  of  ground  atoms  from 
which  P  has  a  successful  computation.  Two  programs  are  success  set  equivalent  if  they  have  the 
same  success  set. 

Success  set  equivalence  does  not  capture  differences  in  the  answer  substitutions  computed  by 
two  programs.  Define  the  answer  ssisftfslion  set  of  a  program  P  to  be  the  set  of  pairs  (G,t) 
such  that  P  has  a  successful  computation  from  the  goal  G  with  answer  substitution  t  [45],  Two 
program  are  saswer-sakslifsltoa  equivalent  iff  they  have  the  same  answer  substitution  set. 

2.5  Prolog 

Prolog  is  a  concrete  programming  language  based  on  the  abstract  logic  programming  model.  Prolog 
employs  a  procedural  reading  of  logic  programs,  in  which  each  goal  atom  is  viewed  as  a  procedure 
call,  and  each  clause  A  <—  Bx.Bq,. .  ,,Bn  is  viewed  as  a  definition  of  a  procedure,  similar  to: 

procedure  A 
begin 

call  B i, 
call  B3, 

call  Bn 
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Such  a  clause  is  interpreted:  “To  execute  procedure  A ,  call  B\  and  call  B2  and  . . .  and  call  Bn" . 
Prolog  uses  unification  to  realise  various  aspects  of  procedural  languages  such  as  parameter  passing 
by  reference  or  by  value,  assignment,  and  data  selection  and  construction,  as  was  shown  in  Figure 
1  above. 

Formally,  this  operational  behavior  is  achieved  by  employing  a  computation  rule  that  selects 
the  leftmost  atom  in  a  goal,  thus  eliminating  And-nondeterminism.  Instead  of  the  Reduce  transition 
of  logic  programs,  Prolog  employs  the  following  transition  rule: 

•  (MM . Mfi) - *  «B» . BkM . A«)V\eoV) 

If  my«(/4,,A)  —  O'  for  »mc  renamed  apart  clause  A  —  Bi,..  .,Bk  of  P. 

The  resulting  transition  still  incorporates  Or-nondeterminism,  which  is  interpreted  in  Prolog  as 
implicit  search  for  all  solutions.  That  is,  Prolog  attempts  to  explore  all  computations  from  the 
initial  goal,  returning  the  answer  substitutions  of  successful  computations. 

Most  sequential  Prolog  systems  compute  the  solutions  to  a  goal  by  searching  depth-first  the 
computation  tree  induced  by  different  choices  of  clauses.  Typically,  one  solution  is  produced  at  a 
time,  and  additional  solutions  are  searched  for  only  by  request.  Under  this  behavior  it  is  possible 
for  a  program  to  produce  several  solutions,  and  then  diverge.  The  point  of  divergence  is  determined 
by  the  order  of  clause  selection.  Usually  a  Prolog  program  is  defined  as  a  sequence  (rather  than 
set)  of  clauses,  and  the  order  of  clause  selection  is  textual  order. 

The  possibility  of  divergence  in  the  face  of  both  successful  and  infinite  computations  makes 
Prolog  incomplete  as  a  proof  procedure  for  logic  programs  (see  Section  2.4).  However,  this  in¬ 
completeness  is  not  a  major  problem  in  practice.  Knowing  the  Prolog  computation  rule,  Prolog 
programmers  order  bodies  of  clauses  so  that  infinite  computations  are  avoided  on  expected  goals. 
In  the  example  logic  programs  in  Section  2.1  above,  Prolog  computations  terminate  on  nm(Xt,S) 
goals  whose  first  argument  is  a  complete  list  of  numbers;  on  nLketk(X,Ll,Lt)  if  both  l,  and 
are  complete  lists,  and  on  flalltu(T,Xt)  if  T  is  a  complete  tree. 

Prolog  is  a  convenient  language  for  a  taiga  dais  of  applications.  However,  to  be  practical 
it  augmented  the  pure  logic  programming  model  with  extra-logical  extensions  [171).  The  main 
purpose  of  these  extensions  is  to  specify  input/output  and  to  realise  a  shared  modifiable  store.  As 
we  shall  see  later,  this  deficiency  is  peculiar  to  Prolog,  and  is  not  inherent  to  the  logic  programming 
model.  Indeed,  concurrent  logic  programs  can  specify  both  input/output  and  shared  modifiable 
store  in  a  pure  way,  relying  solely  on  their  different  computation  rule  and  different  interpretation 
of  nondeterminism. 


PART  II.  CORE  CONCEPTS  AND  TECHNIQUES 

3.  Concurrent  Logic  Programming 

Transformational  va.  reactive  languages 

Prolog  is  a  sequential  programming  language,  designed  to  run  efficiently  on  a  von  Neumann  ma¬ 
chine  by  exploiting  its  ability  to  perform  efficient  itack  management.  Sequential  Prolog  can  be 
parallelised,  and  much  research  is  devoted  to  effective  ways  of  doing  so  [122,10,207).  Nevertheless, 
Prolog,  whether  executed  sequentially  or  in  parallel,  should  not  be  termed  s  concurrent  program¬ 
ming  language. 

To  understand  why  Prolog  and  other  parallelisable  sequential  languages  cannot  be  termed 
concurrent  languages,  it  is  useful  to  distinguish  between  two  types  of  systems,  or  programs:  trans¬ 
formational  and  reactive  [71].  The  distinction  it  closely  related  to  the  distinction  between  cloned 


and  open  systems  [79].  A  transformational  (closed)  system  receives  an  input  at  the  beginning 
of  ita  operation  and  yields  an  output  at  its  end.  On  the  other  hand  the  purpose  of  a  reactive 
(open)  system  is  not  necessarily  to  obtain  a  final  result,  but  to  maintain  some  interaction  with  its 
environment.  Some  reactive  systems,  such  as  operating  systems,  database  management  systems, 
etc.,  ideally  never  terminate,  and  in  this  sense  do  not  yield  a  final  result  at  all. 

All  classical  sequential  languages  in  general,  and  Prolog  in  particular,  were  designed  with 
the  transformational  view  in  mind.  These  languages  contain  some  basic  interactive  input  /output 
capabilities,  but  usually  these  capabilities  are  not  an  integrated  component  of  the  language  and 
sometimes,  as  in  Prolog,  are  completely  divorced  from  its  basic  model  of  computation. 

It  may  seem  that  the  distinction  between  transformational  and  reactive  systems  is  not  directly 
related  to  concurrent  systems,  and  perhaps  there  could  be  concurrent  transformational  systems  as 
well  as  concurrent  reactive  ones.  Indeed,  there  are  concurrent  systems  that  exploit  parallelism  to 
achieve  high  performance  in  applications  that  are  transformational  in  nature,  such  as  the  solution 
of  large  numerical  problems.  Following  Harel  [70],  we  call  concurrent  systems  that  are  transforma¬ 
tional  as  a  whole  parallel  Sf stems.  However,  if  we  investigate  the  components  of  any  concurrent 
system  —  whether  transformational  or  reactive  as  a  whole  —  we  find  these  components  to  be 
reactive;  they  maintain  continuous  interaction  at  least  with  each  other  and  possibly  also  with  the 
environment. 

Hence,  there  seems  to  be  a  common  aspect  to  all  concurrent  systems  or  algorithms,  indepen¬ 
dently  of  what  is  their  target  architecture,  and  whether  they  exploit  concurrency  to  achieve  higher 
performance,  physical  distribution,  or  better  interaction  with  their  environment.  The  common 
aspect  is  that  a  language  that  describes  and  implements  them  needs  to  specify  reactive  processes 
—  their  creation,  interconnection,  internal  behavior,  communication  and  synchronization. 

Don’t-know  ml  don’t-ore  nondrtmninim 

Many  abstract  computational  models  are  nondeterministic,  including  nondeterministic  Turing  ma¬ 
chines,  nondeterminitic  finite  aatom»U,  and  logic  programs.  Reactive  systems  are  also  nondeter¬ 
ministic.  However,  the  nature  of  nondetenninism  in  the  former  is  very  different  from  the  one  em¬ 
ployed  in  the  latter.  Kowalski  [109]  adequately  termed  nondetenninism  of  the  first  type  don’t-know 
nondeterminiam,  and  of  the  second  type  don't-care  nondetenninism3.  Don't-care  nondetenninism 
is  often  called  also  indeterminism,  and  we  will  use  these  two  notions  interchangeably. 

The  don’t-know  interpretation  of  nondeterminism  implies  that  the  programmer  need  not  know 
which  of  the  choices  specified  in  the  program  is  the  correct  one;  it  is  the  responsibility  of  the 
execution  of  the  program  to  choose  right  when  several  transitions  are  enabled.  Formally,  this  is 
achieved  by  specifying  results  of  only  successful  computations  as  observable.  Examples  of  such 
observables  are  the  set  of  strings  accepted  by  a  nondeterministic  finite  automaton,  or  goal-answer 
substitutions  pairs  of  successful  computations  of  a  logic  program. 

Don’t-know  nondeterminism  is  a  very  convenient  tool  for  specifying  transformational  closed 
systems,  as  witnessed  by  the  Prolog  language.  However,  it  seems  to  be  incompatible  with  reactive 
open  systems.  The  essence  of  don’t-know  nondeterminism  is  that  failing  computations  Mdon't 
count",  and  only  successful  computations  may  produce  observable  results.  However,  it  is  not 
possible,  in  general,  to  know  in  advance  whether  a  computation  will  succeed  or  fail;  hence  a  don’t- 
know  nondeterministic  computation  cannot  produce  partial  output  before  it  completes;  and  hence 
it  cannot  be  reactive4. 

The  don’t-care  interpretation  of  nondetenninism,  on  the  other  hand,  requires  that  results  of 
failing  computations  be  observable.  Hence  a  don’t-care  nondeterministic  computation  may  produce 
partial  output  (partial  answer  substitutions,  in  the  esse  of  concurrent  logic  programs)  even  if  it  is 


3  Manna  and  Poucti  (125]  call  tbc  first  existential  nondtierminism  and  the  second  universal  nondetermin¬ 
ism. 

4  A  related  argument  with  a  similar  cnnrh—hui  is  given  by  Ueda  [202]. 
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not  known  whether  the  computation  will  eventually  succeed  or  fail. 

Don’t-care  nondeterminism  seems  to  be  unnecessary,  sometimes  even  a  nuisance,  in  the  spec¬ 
ification  of  transformational  systems,  but  as  we  shall  see  it  is  essential  in  the  specification  of 
concurrent  reactive  systems. 

Although  the  nondeterminism  of  abstract  computational  models  is  commonly  interpreted  as 
don’t-know  nondeterminism,  such  models  are  also  open  to  the  don’t-care  interpretation.  For  ex¬ 
ample,  nondeterministic  finite  automata  can  be  Used  to  specify  either  formal  languages  [88]  (don’t- 
know  nondeterminism),  or  finite-state  reactive  systems  (don’t-care  nondeterminism)  [125].  The 
logic  programming  model  is  also  open  to  these  two  interpretations.  Prolog  takes  the  don’t-know 
interpretation,  whereas  concurrent  logic  language,  being  geared  for  specifying  reactive  open  sys¬ 
tems,  take  the  don’t-care  interpretation. 

Formally,  the  two  interpretations  of  nondeterminism  induce  different  notions  of  equivalence  on 
the  set  of  programs.  Assume  some  notion  of  equivalence  of  two  (either  failing  and  successful)  com¬ 
putations.  For  example,  in  logic  programs  two  computations  on  me  same  initial  goal  are  equivalent 
if  they  have  the  same  answer  substitution  and  same  mode  of  termination.  Under  the  don’t-know 
interpretation,  two  programs  are  equivalent  if  they  have  equivalent  successful  computations.  Under 
the  don’t-care  interpretation,  two  programs  are  equivalent  if  they  have  equivalent  computations, 
whether  successful  or  not. 

We  emphasize  that  concurrent  logic  languages  are  not  unique  in  adopting  the  don’t-care 
interpretation  of  nondeterminism.  Rather,  almost  all  models  of  concurrency  and  concurrent  pro¬ 
gramming  languages,  including  CSP  [86,87],  CCS  [129],  UNITY  [16],  Occam  [91],  Ada,  and  others, 
take  this  approach  as  well.  The  difference  is  that  concurrent  logic  languages  have  as  an  ancestor  an 
abstract  nondeterministic  computational  model  —  namely  logic  programs  —  whose  nondetermin¬ 
ism  can  be  interpreted  both  as  don’t-know  and  as  don’t-care.  The  other  concurrent  models  and 
languages  do  not  have  related  models  or  languages  which  incorporate  don’t-know  nondeterminism, 
hence  for  them  the  questions  addressed  here  are  usually  not  raised. 

One  active  research  direction  in  logic  programming  explores  parallel  (non  reactive)  languages 
that  incorporate  both  don’t-know  and  don’t-care  nondeterminsm  [209,210,150,153,154,157,72,8, 
179].  The  goal  of  these  languages  it  to  execute  logic  programs  more  efficiently  by  exploiting 
determinism,  more  sophisticated  control,  and  parallelism.  This  research  direction  is  outside  the 
scope  of  the  survey.  It  is  discussed  further  in  Chapter  21. 

What  are  concurrent  logic  languages? 

Concurrent  logic  languages  are  logic  programming  languages  that  can  specify  reactive  open  systems, 
and  thus  can  be  used  to  implement  concurrent  systems  and  parallel  algorithms.  A  concurrent  logic 
program  is  a  don't-care  nondeterministic  logic  program  augmented  with  synchronisation.  A  logic 
program  thus  augmented  can  realise  the  basic  notions  of  concurrency  —  processes,  communication, 
synchronisation,  and  indeterminism. 

The  process  reading  of  logic  programs  [42],  employed  by  concurrent  logic  programs,  is  different 
from  the  procedural  reading  employed  by  Prolog  and  mentioned  in  Section  2.5.  In  the  process 
reading  of  logic  programs  each  goal  atom  p(Ti,. . Tn)  is  viewed  as  a  process,  whose  program 
state  (uprogram  counter”)  is  the  predicate  p/n  and  data  state  (“process  registers”)  is  the  sequence 
of  terms  T\f. . Tn.  The  goal  as  a  whole  is  viewed  as  a  network  of  concurrent  processes,  whose 
process  interconnection  pattern  is  specified  by  the  logical  variables  shared  between  goal  atoms. 
Processes  communicate  by  instantiating  shared  logical  variables  and  synchronise  by  waiting  for 
logical  variables  to  be  instantiated.  This  view  is  summarised  in  Figure  2. 

The  possible  behaviors  of  a  process  are  specified  by  guarded  Horn  clauses,  which  have  the 
form: 

Bead  *—  Guard  |  Body. 

The  head  and  guard  specify  the  conditions  under  which  the  Reduce  transition  can  use  the  clause, 
as  well  as  the  effect  of  the  transition  on  the  resulting  state.  This  is  explained  further  below.  The 
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Process 

Process  network 
Instruction  for  process  action 


Goal  atom 

Goal  (collection  of  atoms) 
Clause  (See  Figure  3) 


Communication 
Shared  location 

Communication 

Synchronization 


channel; 


Shared  logical  variable 
Instantiation  of  a  shared  variable 


Wait  until  a  shared  variable  is  sufficiently  instantiated 


figure  t:  The  process  reading  of  logic  programs 


body  specifies  the  stste  of  the  process  after  taking  the  transition:  a  process  can  halt  fernpty  body) 
change  state  (unit  body),  or  become  eeverai  concurrent  processes  (a  conjunctive  body).  This 
summarized  in  Figure  3. 


Balt. 

A  *-  G  \ 

true. 

Change  (data  and/or  program)  state 

B. 

(i.e.,  become  a  different  process): 

A  3—  G 

Become  t  concurrent  processes: 

A  G 

Bi,. .  .,Bt. 

Fifun  3:  Clause*  as  instruction*  for  process  behavior 


Concurrent  logic  languagea  employ  the  don’t-care  interpretation  of  nondetermin.sm  Intu¬ 
itively  this  means  that  once  a  transition  has  been  taken  the  computation  is  committed  to  it ,  an 
cannot  backtrack  or  explore  in  parallel  other  alternative.  Formally  this ,  » <  rea  .zed  by  making 
observable  partial  results  of  the  computstion,  ss  well  as  the  fins]  results  of  both  successful,  failing, 
and  deadlcoked  computations  [65],  as  explained  in  Section  9  below.  .  ,  t 

The  head  and  guard  of  a  guarded  clause,  specify  condition,  on  using  the  clause  for  Ruction. 
A  guarded  clause  can  be  used  to  reduce  a  goal  atom  only  if  the  condition,  specified  by  theheadand 
theguard  are  satisfied  by  the  atom.  Concurrent  logic  language,  differ  in  wbat  can  be  RPe«fi«lW 
the  Sad  and  the  guard.  A  flat  concurrent  logic  language  incorporates  a  set  of  primitive  pred  cates, 
in  the  languages  surveyed  these  include  mainly  equality,  inequality  and  ar.thmet 'c  pred'catcs^  A 
guard  in  a  flat  language  consists  of  a  (possibly  empty)  sequence  of  atoms  of  these  predicate.  In  a 
Sn-flat  language,  on  the  other  hand,  the  guara  may  contain  both  prmut.ve  and  defined 1 
and  thus  guard  computation,  may  be  arbitrarily  complex^  Since  guards  of  a  non-flat 
recursively  defined  by  guarded  clauses,  s  computation  of  it  forms  an  And/Or-tree  of  processes. 
In  a  flat  language  the  proce*ses  are  a  “flat"  collection;  hence  their  name^  Flat  languages  have 
received  moat  of  the  recent  attention  of  researcher*,  because  it  was  found  that  their  simplio  y 
and  amenability  to  efficient  implementation  come  at  a  relatively  low  coet  in  expressiveness  snd 
convenience,  when  compared  to  non-flat  languages  (disciuwed  in  Section  18). 

Concurrent  processes  communicate  by  instantiating  shared  logical  variables,  and  synchronize 
by  waiting  for  variable,  to  be  instant,. ted.  Variable  instantiation  .  realized  in  mo* 
fo>nc  language  by  unification.  Three  approaches  were  proposed  to  the  specification  of  synchro- 
M^ncurrent  logic  programming:  input  matching  (al*>  «J  ed  input  un.ficat.on,  one-way 
unification,  or  juat  matching)  [20,24, 198],  read-only  unification  [160],  and  determinacy  condition. 
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[210].  All  share  the  same  general  principle:  the  reduction  of  a  goal  atom  with  a  clause  may  be 
suspended  until  the  atom's  arguments  are  further  instantiated.  Once  the  atom  is  sufficiently  in¬ 
stantiated,  the  reduction  may  become  enabled  or  terminally  disabled,  depending  on  the  conditions 
specified  by  the  head  and  guard.  Since  input  matching  is  the  simplest  and  most  useful  synchro¬ 
nization  mechanism,  we  present  it  here  and  defer  the  discussion  of  the  others  till  the  languages 
that  employ  them  are  introduced. 

The  matching  of  a  goal  atom  A  with  a  head  of  a  clause  A'  ♦—  G|  B  succeeds  if  A  is  an  instance 
of  A7;  in  such  a  case  it  returns  a  most  general  substitution  0  such  that  A  =  A'6.  It  fails  if  the  goal 
atom  and  the  head  are  not  unifiable.  Otherwise  it  suspends.  More  precisely, 

{9  9  is  the  most  general  substitution  such  that  A  =  A'9 

fail  if  m^a(A,A#)  =  fail 
suspend  otherwise. 

Unlike  unification,  there  is  only  one  most  general  matching  substitution.  Using  matching  for 
reducing  a  goal  with  a  clause  delays  the  reduction  until  the  goal  is  sufficiently  instantiated,  so 
that  its  unification  with  the  clause  head  can  be  completed  without  instantiating  goal  variables. 
Examples  are  given  in  Figure  4. 


Goal 

Clause  head 

Result 

p(“) 

P(X) 

{AWa} 

P(X) 

p(«) 

suspend 

P(‘) 

p(t>) 

fail 

»»m([X|X»],S) 

sum(In,Oul) 

»«m([A|A»],S) 

suspend 

««m([  ],0«<) 

«Fn([X|X»],S) 

fail 

Figure  4  ■  Examples  of  input  matching  of  goals  with  clause  heads 


The  dataflow  nature  of  matching  is  evident:  an  “instruction''  (clause)  is  enabled  as  soon  as 
sufficient  “data”  (variable  instantiations)  arrive.  Although  simple,  matching  is  found  in  practice 
sufficiently  powerful  for  all  but  the  most  complex  synchronization  tasks,  as  demonstrated  by  the 
programming  techniques  in  Section  7. 

Languages  in  the  concurrent  logic  programming  family  differ  mainly  in  the  capabilities  of  their 
output  mechanism.  On  one  end  of  the  spectrum  there  are  languages  that  allow  only  matching 
prior  to  clause  selection  and  perform  unification  past  clause  selection.  On  the  other  end  there  are 
languages  which  allow  both  matching  and  unification  as  tests  prior  to  such  a  commitment.  Test 
unification  in  its  most  general  form  subsumes  powerful  synchronization  mechanisms  used  in  more 
conventional  models  such  as  multiple  simultaneous  test-and-set  and  CSP-like  output  guards. 

These  differences  and  others  are  further  elaborated  upon  when  discussing  the  various  languages 
in  Part  III  of  the  paper.  Until  then  we  concentrate  on  the  common  aspects  of  the  family. 

4.  FCP(|)  —  A  Simple  Concurrent  Logic  Programming  Language 

We  illustrate  the  various  aspects  of  concurrent  logic  programming  discussed  in  the  previous  section 
using  a  simple  concurrent  logic  language,  FCP(|)  (read  “FCP- commit”)*.  FCP(|)  is  closely  related 

*  The  nomenclature  we  uee  to  describe  concurrent  logic  language*  k  influenced  by  the  one  used  by  Saraewat  (150], 


to  Flat  GHC  [198]  and  to  Oc  (read  “Oh  sect”)  [81,83].  We  use  FCP(|)  as  the  introductory  language 
instead  of  the  more  familiar  language  Flat  GHC  since  its  definition  is  simpler,  and  since  it  can  more 
easily  express  some  of  the  programming  techniques  related  to  distributed  termination  detection, 
discussed  in  Section  7.  However  all  programs  shown  in  Sections  5  and  7  are  legal  Flat  GHC 
programs  as  well,  and,  except  for  the  termination  detection  programs,  the  difference  between 
the  behavior  of  these  programs  under  the  operational  semantics  of  Flat  GHC  and  of  FCP(|)  is 
immaterial.  See  the  discussion  of  Flat  GHC  in  Section  10. 

4.1  Syntax 

Definition:  Guard  test  predicates,  guarded  clause,  FCP(|)  program. 

•  We  assume  a  fixed  finite  set  of  guard  test  predicates,  including  xnttger{X ),  X  <  Y ,  X  =  Y , 
X  £  Y,  and  others.  The  predicates  assumed  in  this  paper  are  given  in  Section  4.2  below. 

•  A  guarded  clause  is  a  formula  of  the  form: 

A  «—  G|,.. .,Gm  I  Hi,. . .,Bn-  m,n  >  0, 

where  A,  G%,. .  .,Gm,  B\,. .  .,Bn  are  atoms,  the  predicate  of  each  t  =  1,. .  .,m  is  a  guard 
test  predicate  and  the  variables  of  G{  occur  in  A.  If  the  quard  is  empty  (m  =  0)  then  the 
commit  operator  *|*  is  omitted.  An  empty  body  (n  =  0)  is  denoted  by  true. 

•  An  FCP(j)  program  is  a  finite  sequence  of  guarded  clauses,  which  contains  the  unit  clause 

X  =  X  as  the  only  clause  with  head  predicate  | 

Note:  *=’  is  a  primitive  predicates  in  FCP(|)  that  cannot  be  redefined  by  a  program.  The  reason 
for  a  program  being  a  sequence  of  clauses,  rather  than  a  set,  will  become  apparent  when  we  discuss 
the  otherwise  predicate  in  Section  7. 

4.2  Operational  semantics 

Modelling  concurrency  by  interleaving  atomic  actions 

We  specify  the  behavior  of  concurrent  logic  programs  in  general,  and  FCP(|)  programs  in  particular, 
using  a  transition  system  very  similar  to  that  of  logic  programs  In  this  standard  approach  [142,16], 
concurrency  is  modelled  by  the  nondeterministic  interleaving  of  the  atomic  actions  of  the  processes 
participating  in  the  computation.  The  approach  requires,  therefore,  a  precise  specification  of  what 
is  an  atomic  step  of  execution,  as  differences  in  the  grain  of  atomic  actions  may  lead  to  radically 
different  computational  models.  As  we  shall  see,  one  of  the  major  differences  between  the  various 
concurrent  languages  is  indeed  the  grain  of  their  atomic  actions. 

Our  transition  system  is  not  reactive:  it  does  not  model  input  from  an  outside  environment. 
This  is  not  a  major  drawback,  since  if  we  wish  to  investigate  a  reactive  computation  of  a  program 
P  from  a  goal  G,  we  can  model  the  environment  as  another  process  G't  whose  behavior  is  specified 
by  a  program,  say  E ,  with  predicates  disjoint  from  P ,  and  investigate  computations  of  the  program 
P  U  E  from  the  conjunctive  goal  ( G,G ')  [59].  An  alternative  is  to  add  an  explicit  input  transition 
[119]. 

Modeling  concurrency  by  interleaving  is  a  common  approach,  which  has  the  advantage  of 
being  simple  and  well  understood.  Its  disadvantage  is  that  concurrency  is  not  explicit,  and  hence 
an  interleaving  model  sometimes  gives  rise  to  artificial  fairness  problems,  which  are  not  present  if 
the  concurrency  is  explicit  in  the  model.  We  defer  the  discussion  of  fairness  to  Section  6. 

Guard  test  predicates  and  guard  checking 

The  meaning  of  the  guard  test  predicates  is  given  via  a  fixed  set  of  ground  atoms  T  over  these 
predicates.  The  predicates  used  in  this  paper  and  their  meanings  are: 

X=X  for  every  ground  term  X . 
but  it  different  from  it. 
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X^Y  for  every  two  ground  terms  X  and  Y  which  are  not  equal. 

integer(X)  for  every  integer  X. 

X<Y  for  every  ground  arithmetic  expressions  X  and  Y 

such  that  the  value  of  X  is  less  than  the  value  of  Y. 

X<Y  for  every  ground  arithmetic  expressions  X  and  Y  such 
that  the  value  of  X  is  leas  than  or  equal  the  value  of  Y . 

X =:=  Y  for  every  ground  arithmetic  expressions  X  and  Y 
whose  values  are  the  same. 

X  \=:=  Y  for  every  ground  arithmetic  expressions  X  and  Y 
whose  values  are  different. 


There  are  three  guard  primitives  vsr(X),  nnknown(X)t  and  otherw ue,  whose  semantics  cannot  be 
given  simply  by  a  set  of  ground  atoms.  It  is  discussed  when  the  primitives  are  introduced.  The 
set  of  guard  test  predicates  of  “real”  concurrent  logic  languages  is  not  much  larger  than  the  list 
above.  See  for  example  [100,170]. 

An  atom  is  true  in  T  if  every  instance  of  it  is  in  T,  and  false  otherwise.  A  conjunction  of 
atoms  is  true  in  T  if  all  its  members  are  true  in  T,  and  false  otherwise.  The  check  of  a  guard  G 
succeeds  if  G  is  true  in  Tt  it  fails  if  no  instance  of  G  is  true  in  T%  and  it  suspends  otherwise.  In 
other  words,  the  check  suspends  if  some  future  instantiation  of  the  guard  may  result  in  it  being 
true.  For  example: 

checking  integer(g)  succeeds 
checking  snieyer(a)  fails 
checking  integer(X)  suspends 

checking  3  <  5  succeeds 
checking  5  <  3  fails 
checking  3  <  X  suspends 
checking  «  <  X  fails 

The  clause  try  function 

The  only  difference  between  the  transition  system  of  logic  programs  and  of  FCP(|)  is  that  the 
Reduce  and  Fail  transitions  employ  matching  and  guard  checking  instead  of  unification.  The 
operation  of  matching  and  guard  checking  is  captured  by  the  clause  try  function,  fry. 

The  mys  function  unifies  the  goal  atom  with  the  clause  head,  and  returns  a  substitution  or 
fail.  The  trg  function  does  the  same  if  the  goal  atom  is  an  equality  and  the  clause  is  the  equality 
:lause  X—X.  Otherwise  it  matches  the  goal  atom  and  clause  head,  and,  if  successful,  checks  the 
guard,  instantiated  by  the  matching  substitution.  It  may  return  tusptnd  if  the  matching  or  the 
guard  check  suspends,  trg  is  defined  for  equality  goals  as  follows: 


try(Ti=T7,X=X)=mgu(Tx,T2)- 


And  for  clauses  whose  head  predicate  is  different  from  the  equality  predicate: 


MA,(> *‘~G\B))  = 


|  foil 
v  ttupcni 


if  maich(A,A')  =  0  A  checking  G9  succeeds 
if  =  9  A  checking  G9  fails 

V  my»(A,A')  —  foil 
otherwise 


The  transition  system 

The  state  of  a  computation  of  an  FCPfl)  program  is,  as  in  a  logic  program,  a  pair  (G;S),  where 
G  is  a  goal  and  9  a  substitution.  A  computation  begins  from  an  initial  goal  ( G;t )  and  progresses 
using  Reduce  and  Fail  transitions,  similar  to  computations  of  logic  programs.  The  difference  is 
that  instead  of  unifying  the  goal  atom  with  the  clause  head,  the  Reduce  and  Fail  transitions  use 
the  clause  try  function,  fry,  instead  of  the  mys  function: 


-  If  - 


1.  Reduce 

(A1,...,Ai,...,Anfi)  — ((Ai,. . ,,Bi . Bk . AJVfioff) 

If  iry(Ai,C)  =  V  for  tome  renamed  apart  dauee  C  =  A  —  G  |  B\,. . . ,fl*  of  P. 

2.  FaH 

fail 

(Ai,. .  -,Ai,. .  .,AK;d) - -  (Jail-Jf) 

If  for  some  i  and  for  every  renamed  apart  clause  Cof  P,  trj)(Ai,C)  =  fail. 

Note  that  the  stupe  ad  result  of  the  try  function  is  not  used  in  the  Reduce  and  Fail  tran¬ 
sitions.  Its  effect,  therefore,  is  to  prevent  a  goal  atom  from  reducing  with  a  clause  and  from 
foiling.  Specifically,  if  A  is  a  goal  atom  for  which  trf(A,C)—ntrend  for  some  clause  C  in  P,  and 
trp(j4,C')=s«speud  or  fail  for  every  other  clause  C'  in  P,  then  A  can  participate  neither  in  a 
Reduce  transition  nor  in  a  Fail  transition.  Such  a  goal  atom  is  called  suspended.  K  state  consisting 
of  a  goal  in  which  all  stoma  are  suspended  is  terminal,  as  no  transition  applies  to  it.  Such  a  state 
it  is  called  a  deadlock  state,  and  a  computation  ending  in  a  deadlock  state  is  called  a  deadlocked 
computation*. 

The  following  lemma  relates  successful  computations  of  an  FCP(|)  program  to  computations 
of  the  corresponding  logic  program. 

Lemma  (Soundness  of  FCP(|)): 

Let  c  be  a  non-deadlocked  computation  of  an  FCP(|)  program  P.  Then  there  is  a  finite  subset  T' 
of  Tsuch  that  c  is  also  a  computation  of  the  logic  program  PUT'. 

Proof:  Immediate.  | 

The  opposite  direction  of  this  lemma  is  of  course  not  true.  In  particular,  the  logic  program  can 
proceed  with  a  Reduce  transition  on  states  which  the  corresponding  FCP(|)  program  deadlocks. 

Observables  of  concurrent  logic  programs 

As  explained  in  Section  3,  the  observables  of  a  concurrent  logic  program  reflect  both  successful 
and  failing  computations. 

Definition:  Observables  of  a  concurrent  lope  program 

The  okterveklt  kekavior  of  a  finite  computation  c  =  (Gi,£) . (G„,0n)  of  a  concurrent  logic 

program  P  is  the  triple  {G\  ,8.x)  where  i  is  the  answer  substitution  of  the  computation  (i.e.  0„ 
restricted  to  variables  of  Gj),  and  x=Gn  if  Gn=trve  or  Gn=/«*I,  and  z=dt*dlock  otherwise.  The 
okaeraaklet  of  a  concurrent  logic  program  P, fl  P]] ,  are  the  set  of  observable  behaviors  of  every 
computation  e  of  P.  | 

4.3  Examples  of  concurrent  logic  programs 

We  show  several  simple  examples  of  concurrent  logic  programs,  written  in  FCP(|),  that  correspond 
to  the  logic  programs  shown  in  Section  2.2,  and  describe  their  behavior. 

The  following  concurrent  logic  program  defines  the  process  >*m(Xs,S),  which  unifies  5  with 
the  sum  of  the  elements  of  the  input  stream  Xe. 

«um(Xs,S)  «—  %1 

sum'(Xs,0,S). 

sum'([  ],P,S)  -  P=S.  %2 

sum'([X|Xa],P,S)  -  %3 

pMX.p.pO 

sum'(Xs,P',S). 

®  Not*,  hwrwr,  that  thk  Unqinokjy  b  appropriate  only  ter  a  doaed  coapbiiho.  In  an  open  conpuutioa 
riritbte  mmy  b«  loatentiated  by  the  whiiMMl,  and  haw  a  mere  reftaed  definition  at  deadlodr  k  required, 
which  tehee  into  account  which  variables  are  aoceerihla  to  the  environment. 
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There  are  two  differences  between  this  program  and  the  logic  program  for  Jam  shown  in  Section 
2.2.  The  first  is  that  the  base  clause  of  sum'  unifies  the  partial  sum  P  with  the  answer  5  explicitly 
in  the  body.  This  is  necessary  since  in  FCP(|)  the  goal  atom  is  matched  with  the  clause  head,  not 
unified  with  it.  The  second  is  that  the  definition  of  ptus  has  to  be  modified  to  reflect  the  direction 
of  the  computation,  plus  behaves  as  if  defined  by  clauses  of  the  form: 

plus(0,0,X)  —  X=0. 
pius(0,l,X)  «—  X=l. 
plus(2,2,X)  -  X=4 


/ 

f 

> 

» 


Note  also  that  each  clause  has  an  implicit  commit  operator  M|” ,  which  is  omitted  by  our  syntactic 
conventions  since  the  guard  is  empty.  In  contrast  to  the  logic  program  for  smm,  the  concurrent 
logic  program  has  only  successful  computations  on  an  initial  goal  where  Xs  is  a  list  of 

integers  and  S  is  a  variable. 

Consider  the  following  partial  computation: 


Reduce(i,l) 


Reduc*{  1 ,3) 


(sum([l,2],S);  c) 

<sum'([l,2],0,S):  £) 

(plus(l,0,P),  sum'([2],P,S); r) 

(plus(l,0,P),  plu>(2,P,P0,  ium'([  ],P',S);  £> 


Reduce(2,3) 


At  this  state  the  logic  program  could  reduce  any  of  the  three  atoms.  However,  the  concurrent  logic 
program  cannot  reduce  the  second  flat  process:  it  is  suspended  until  ita  second  argument  P  is 
instantiated.  A  possible  continuation  of  this  partial  computation  is: 


R*duct(l  plua) 

(plus(l,0,P),  plus(2,P,P0,  sum([  l.P'.S);  e)  - - — ♦ 


(P=l,  plus(2,P,P'),  sum([  ],P',S);  e) 


Reduce(S,S) 


(P=l,  plus(2,P,P').  P'=S;  £) 


Raduc^S.s) 


<P=l,plus(2,P,S);  f) 


R«duce(l,s) 


(plus(2,l,S);  £) 


Reduc<K  1  ,plu») 


(S=3;  £) 


R*duc*(  1  ,=) 


(<r»e;  {S.-.3}) 

The  logic  program  for  flattening  a  tree  can  also  be  easily  turned  into  a  concurrent  logic  program. 
The  only  syntactic  change  required  is  specifying  the  construction  of  the  output  difference-list 
explicitly,  by  an  equality  goal : 


flatten(T,Xa)  -  %4 

flat  ten' (T,Xs\[  ]). 

flatten'(leaf(X),Xs\Ys)  %5 

Xs=[X|Ys]. 

flatten'(tree(L,R),Xs\Zs)  *-  %6 

fiatten'(L,Xs\Ys), 
flatten'(R,Ys\Za). 

The  flttten  process  can  operate  on  a  tree  that  is  provided  incrementally,  by  a  concurrent  process, 
as  the  two  clauses  of  flaUtn1  suspend  until  it  is  known  whether  their  first  argument  is  letf(-)  or 
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It  can  also  be  connected  to  the  earn  process,  which  incrementally  sums  the  list  produced 
by  fatten,  as  shown  by  the  following  computation: 


(flntten(tree<leaf(17),leaf(19)),Xi),  aom(Xs,S);  e) 


Rsdnca(l,4) 


(flatten' (tree(leaf(17),Ieaf(19)),Xe\{ )),  eum(Xs,S);  e) 


n«s.w(i.s) 


(flatten' (leaf(17)lXs\Ys),  flatten'(leaf(19),Ys\[  ]),Xs\[  ]),  sum(Xs>S);  e) 

IMnetO  1) 

(flatten' (leaf(  19) ,Ys\(  ]),  sum([17|Ys],S);  (Xar-[17|Ys)}) - — — . 

«sd»Bs(aa) 


IWMM) 


(flatten'(leaf(19),Ys\[ }),  auia([17|YaJ,0,S);  {Xa~[17|Ys]}> 
(flatten'(leaf(19),Ys\(  ]),  plua(17,Q,P0,  sum(Ys,P',S);  {Xs~[17|Ys)}) 


IUduc«(2j4ufl) 


(flatten'(leaf(19),Y8\[ ]),  P'=17,  sum(Ys,P',S);  (Xar-[17|Ys]}) 


Hsdiics(»,=) 


(flatten'(leaf(19),Ys\[  ]),  sum(Ys,17,S);  (Xa^[17|Ya])) 

R«tuec(l^) 

(«um([19],n,S);  {Xs*-*[17,19]}) - * 


R*duc*(l,*) 


{plu*(19,17,P"),  >um([  ),PW,S);  {X*->[17,19]» 


(P^M,  wm((  ],P",S);  {X^[17,19]}> 


JUdu«<2,3) 


(P"=30,  P^=S;  {X*-(17,19I}} 


IU4oC*(i,rs) 


(38=S,  {X^I17,19]}> 
(true,  {Xs^(17,19],Sc-.J8} 


Radue^l,*) 


Summing  the  elements  of  a  tree  can  be  done  more  efficiently  by  combining  the  two  procedures, 
flatten  and  sam,  into  a  single  procedure  free-sam: 


%  free  j»m(  T,S)  — 

S  is  the  sum  of  values  of  the  leaves  of  the  tree  T. 


tree_sum(T,S)  «— 
tteejsum'(T,0,S). 

tree.sum'(tree(L,R),P,S)  «— 
treejum'(L,P,P'), 
treeJum'(R,P',S). 
tree_surn'(leaf(X),P,S)  •— 
ptus(X,P,S). 

This  program  spawns  a  network  of  linearly  connected  pis*  processes,  which  sum  the 
leaf  elements  sequentially  from  left  to  right.  A  possible  computation  from  the  ini¬ 
tial  goal  free^sam(frte(free(fes/(/7),/es/(/J)),/es/(fJ)),S)  may  content  the  intermediate  goal 
fhu(0,11,P),  plaj(/7, P',P"),  plss(fj,  P",S),  which  is  then  reduced  from  left  to  right.  (Note 
however  that  the  leftmost  p fee  process  may  be  reduced  even  if  the  spawning  of  the  p/as  processes 
to  its  right  has  not  been  completed  yet.)  The  program  demonstrates  that  in  a  recursively  con¬ 
structed  process  tree,  leaf  processes  (pis s  processes  in  our  case)  can  communicate  directly  even  if 
they  are  not  directly  related 

Taming  the  raJoU  logic  program  into  a  concurrent  logic  program  is  more  difficult,  since 
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it  employs  don’t- know  nondeterminism  in  an  essential  way:  it  guesses  a  member  of  one  list, 
and  verifies  that  it  is  also  a  member  of  the  second.  The  following  logic  program  does  not 
need  to  guess  which  clause  to  use,  if  the  two  lists  are  given  in  the  initial  goal,  as  in  the  goal 
m^botk(Xa,[l ,£i314]i[3,4i5,6]).  In  such  a  case  all  computations  of  the  following  program  are  deter¬ 
ministic:  each  goal  atom  created  during  the  computation  of  the  program  unifies  with  exactly  one 
clause  head.  The  program  employs  the  difference-list  technique. 

%in.botk(X»,LuLi)  — 

The  list  Xt  contains  the  members  of  both  lists  L\  and  £3. 

in-both(Xa,pC|Li],Lj)  — 
member(X  ,L2,Xa\Xs') , 
in  _both(Xs',Ll,L2). 
inJboth([J,  [],_). 

%  memitr(X,L,Xt\Xt')  *- 

X  is  a  member  of  £  and  Xt  =  [X\Xt'\  or  X  is  not  a  member  of  £  and  Xt  =  AV 

member(X,[XiL],[X|Xs]\Xs). 
member(X,[Y|L],Xs\XsO  — 

X^Y,  member(X,t,Xs\Xs'). 
member(X,[  ],Xs\Xs). 

In  contrast  to  the  previous  program,  this  program  returns  the  (possibly  empty)  list  of  elements 
common  to  the  two  input  lists,  rather  than  nondeterministically  selecting  a  common  single  element 
if  one  exists,  or  failing  if  there  is  none.  Here  i«-4ofh(Xa,Li,  L?)  holds  if  Xt  is  the  list  of  all  elements 
common  to  L\  and  £3.  The  multiplicity  of  elements  in  Xt  is  the  same  as  in  £3.  This  program 
employs  a  difference-list  to  construct  the  output.  member(X,Li,Xt\Xt’)  holds  if  both  X  is  in 
£1  and  X  is  the  difference  between  Xt  and  Xt'  (i.e.  Xt  =  [X|Xs'])  or  if  X  is  not  in  £j  and  the 
difference  between  Xt  and  Xt'  is  empty  (i.e.  Xt  =  Xt'). 

Since  the  program  is  deterministic  on  the  desired  set  of  goals,  it  can  be  turned  into  a  concurrent 
logic  program  quite  easily.  The  following  is  such  an  FCP(|)  program: 

in_both(Xs,[X|Llj,L2)  — 
member(X,£2,Xs\Xs'), 
in_both(Xs',U,L2). 
in.both(Xs,[  ],_)  <—  Xs=[  ]. 

member(X,[X|L],Xs\Xs')  -  Xs=[X|Xs']. 
member(X,[YiL^,Xs\Xs,)  «- 

XjtY  |  member(X,L,Xs\Xa'). 
member(X,[  ),Xs\Xs')  -  Xs=Xs'. 

Intuitively,  the  program  operates  as  follows.  On  a  call  in.botk(Xa,Ll,Lt)  it  spawns  parallel  mem- 
ber(X,Li,Xt\Xt')  processes,  one  for  each  element  of  Lt,  using  the  recursive  clause  of  in.boih. 
Each  of  these  processes  searches  down  the  list  Lt  for  an  element  equal  to  its  X.  If  it  finds  one, 
it  returns  X  in  the  difference-list  Xt\Xt',  by  unifying  Xt  with  [X|Xj'J.  Otherwise  it  returns  the 
empty  difference-list  by  unifying  Xt  with  Xt'.  The  difference-lists  are  implicitly  concatenated  into 
the  output  list  by  the  recursive  clause  of  m.koth.  The  output  list  is  closed  by  the  base  clause  of 
in  Aolk. 

The  program  operates  correctly  even  if  the  two  input  lists  Lt  and  LI  ate  given  incrementally, 
by  some  concurrent  process.  This  is  achieved  since  the  program  inspects  them  using  matching 
(specified  by  clause  heads),  which  suspends  if  the  input  list  is  still  unavailable.  The  output,  in 
contrast,  is  constructed  using  unification,  specified  explicitly  in  the  body  of  the  clauses. 

The  guard  of  the  second  clause  of  member  ensures  that  the  clause  is  selected  only  after  it  is 
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determined  that  X  is  different  from  Y.  In  the  other  clauses  the  guard  is  empty  and  the  commit 
operator  is  implicit. 

4.4  The  power  of  the  logical  variable  in  concurrent  programming 

The  standard  uses  of  the  logical  variable  and  unification  were  mentioned  in  Section  2.3.  Concurrent 
logic  programming  extends  its  use  also  to  process  communication.  By  following  specific  conventions 
and  protocols,  a  wide  range  of  concurrent  programming  techniques  can  be  a  realized  using  shared 
logical  variables,  unification,  and  matching. 

In  this  section  we  provide  a  glossary  of  the  major  uses  of  the  logical  variable  in  concurrent 
programming.  These  will  be  demonstrated  by  concrete  examples  throughout  the  paper.  When 
applicable,  references  to  the  relevant  sections  are  provided. 

It  is  useful  to  view  variables  shared  between  processes  and  their  instantiation  in  two  com¬ 
plementary  ways:  as  communication  channels,  which  transmit  message  streams,  and  as  shared 
locations,  which  are  instantiated,  possibly  incrementally  and  cooperatively,  to  compound  data 
structures.  Both  are  explained  below. 

Shared  logical  variables  as  communication  channels,  and  communication  stream  protocols 
Given  the  single-assignment  nature  of  logical  variables,  it  may  seem  that  a  communication  channel 
implemented  by  a  shared  logical  variable  might  carry  at  most  one  message.  In  some  sense  this 
is  true.  A  better  way  to  understand  the  situation,  however,  is  to  view  a  shared  logical  variable 
as  a  Genie,  who  will  grant  you  a  tingle  wish7.  A  good  strategy  to  follow  when  encountering  such 
a  Genie  is,  of  course,  to  request  to  have  two  wishes.  This  is  realized  by  instantiating  the  logical 
variable  to  a  list  (cons)  cell,  whose  head  and  tail  are  logical  variables.  The  head  may  be  used  to 
send  the  current  message.  Its  tail  is  a  new  variable,  shared  by  the  processes  sharing  the  original 
variable,  and  can  be  used  for  subsequent  communications  ad  infinitum ,  ss  m 

X»  =  KIXs/),  x»i  =  lm7\X*g),  X$t  =  [m3|XsJ],  . . . 

In  this  way  multiple  “wishes*  are  achieved  by  the  multiplicity  of  elements  in  a  single  list. 

Hence,  when  serving  as  a  communication  channel,  a  shared  logical  variable  is  typically  instan¬ 
tiated  to  a  stream  of  messages.  Several  protocols  can  be  followed  in  constructing  such  a  message 
stream.  They  differ  in  the  number  of  processes  sharing  the  variable,  and  whether  they  share  the 
writing  and/or  reading  of  the  stream.  Useful  stream  communication  protocols  include: 

•  point-to-point  communication  (single-writer  single- reader),  e.g.  aumArce  in  Section  4.3 
above. 

•  broadcast  communication,  (single- writer  multiple-reader)  (Section  7.7) 

•  duplex  communication  (two  writers/ readers,  who  use  the  stream  both  for  bidirectional 
communication  and  for  tight  synchronization)  (Section  14) 

•  many-to-one  communication  (multiple-writer,  single-reader),  (Section  15)  and 

•  blackboard  communication  (multiple  writer/reader,  cooperatively  reading  and  writing  the 
stream)  (Section  15). 

These  stream  protocols  require  progressively  stronger  synchronization  mechanisms.  Only  the 
first  two  can  be  implemented  by  all  concurrent  logic  languages.  The  language  properties  required 
to  realize  the  duplex  protocol  and  multiple-producer  protocols  are  described  when  the  protocols 
are  introduced. 

A  single  stream  is  not  always  the  preferred  data  structure  for  communication.  For  high- volume 
many-to-one  communication,  nondeterministic  merging  of  multiple  single-writer  streams  is  usually 
preferred  over  having  multiple  writers  cooperatively  produce  a  single  stream,  since  it  eliminates 
contention  on  the  stream’s  tail.  Stream  merging  is  discussed  in  Sections  6  and  7.  In  addition,  when 
the  set  of  writer  and  the  set  of  readers  in  a  multiple-writer  multi  pie- reader  stream  are  disjoint,  it  is 

7  Him  analogy  ia  dw  to  BUI  Silverman. 
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often  the  ease  that  the  total  ordering  of  a  stream  imposes  on  its  elements  unnecessarily  serialised 
sending  and  receiving  of  messages.  In  such  a  case,  a  more  general  data-structure,  called  Channel, 
may  be  appropriate  [194] .  A  Channel  is  a  partially  ordered  set  of  messages,  which  can  be  produced 
in  parallel  without  contention,  and  can  be  read  with  the  same  degree  of  parallelism  with  which  it 
was  produced. 


Incomplete-message  protocols 

Messages  sent  on  a  stream  need  not  be  ground  (i.e.  variable-free)  terms.  A  message  containing 
variables  is  called  an  incomplete  message.  The  ability  to  send  incomplete  messages  is  perhaps  the 
single  most  important  reason  for  the  added  flexibility  and  expressiveness  of  concurrent  logic  pro¬ 
gramming  languages  over  other  concurrent  languages.  D.H.D.  Warren  once  characterised  Prolog 
as  “pointers  made  easy” .  Adapted  to  concurrent  logic  programming,  the  slogan  may  read  “com¬ 
munication  channels  made  easy" .  Indeed,  incomplete  messages  may  be  viewed  as  a  structured  and 
high-level  way  of  dynamically  allocating  and  distributing  communication  channels. 

There  are  several  useful  protocols  employing  incomplete  messages: 

•  Back-communication  protocol  (the  “remote  procedure  call”  effect)  (Section  7.3). 

A  sender  sends  a  message  with  a  newly  allocated  reply  variable  R ,  and  waits  for  A  to  be 
instantiated.  The  receiver  responds  to  the  message  by  instantiating  R  to  the  reply.  This 
protocol  achieves  essentially  the  effect  of  a  remote-procedure-call  mechanism,  without  adding 
any  special  constructs  to  the  language. 

It  is  easy  to  program  servers  using  many-to-one  message  streams  and  the  back-communication 
protocol.  There  is  no  need  for  the  server  to  know  the  identity  of  the  client  who  sent  the  request, 
nor  for  specially  programmed  mechanisms  to  route  replies  back  to  the  clients. 

If  a  server  can  serve  multiple  requests  in  parallel,  then  a  many-to-many  channel  may  be 
preferred  to  a  stream. 

•  Dialogue  (Section  7.4). 

The  back-communication  protocol  constitutes  only  one  round  in  a  possibly  longer  dialogue. 
The  reply  to  which  the  variable  R  is  instantiated  may  contain  another  reply  variable  R\ 
with  which  the  original  sender  can  continue  the  dialogue,  repeating  the  back-communication 
protocol  as  long  as  desired  by  both  parties. 

•  Network  formation  protocol  (Section  7.2). 

Assume  that  two  processes  p  and  c  sharing  a  variable  X  want  each  to  become  n  processes 
PU- •  >Pn  and  ci,...,Cn,  and  establish  a  communication  stream  A»  between  p,-  and  Cj,  t  = 
1,. . This  can  be  achieved  as  follows,  p  sends  to  c  the  message  [Aj,. .  .,An],  which  contains 
n  variables,  and  then  creates  n  processes,  providing  the  i**  process  with  A,.  Upon  receipt  of 
the  message  c  creates  n  processes  and  similarly  provides  its  process  with  A;. 

In  a  variant  of  this  protocol,  the  stream  [Aj,. .  .,An]  is  constructed  incrementally  by  p.  The 
process  c  need  not  know  the  stream’s  length  in  advance,  and  can  create  the  next  process  e, 
when  A|  is  available.  This  technique  is  heavily  used  in  the  formation  of  recursive  process 
networks,  as  described  in  Section  7.2. 

•  Network  reconfiguration  protocol. 

Assume  a  process  p  that  shares  a  variable  Q  with  a  process  q  and  a  variable  R  with  a  process  r. 
If  p  wants  to  establish  direct  communication  between  q  and  r  it  sends  to  q  on  the  stream  Q  an 
incomplete  message  containing  R.  Once  q  receives  that  message,  it  can  use  R  to  communicate 
with  r  directly. 

This  technique  can  be  employed  to  form  an  arbitrary  communication  graph  in  a  network, 
independently  of  how  the  network  was  created  to  begin  with.  In  particular,  in  a  recursively 
constructed  network  the  communication  graph  need  not  follow  the  path  of  the  recursion,  and 
two  “leaf  processes  may  communicate  directly  no  matter  how  high  up  their  common  ancestor 
is  (Section  4.3). 

s  Bounded-buffer  communication  protocol  (Section  7.3). 


The  banc  stream  communication  protocol  is  asynchronous.  However,  using  incomplete  mes¬ 
sages  one  can  implement  synchronised  communication.  For  example,  a  i- bounded- buffer  pro¬ 
tocol,  for  k  >  1,  can  be  implemented  using  a  single-writer  single-reader  stream  of  incomplete 
messages  as  follows  [177]:  Each  message  contains  an  acknowledgement  variable.  The  reader 
acknowledges  the  message  by  instantiating  the  variable  to  some  constant  upon  receipt.  The 
writer  does  not  send  the  n +kttl  message  before  receiving  an  acknowledgement  for  the  nth 
message. 

Shared  logical  variables  as  shared  locations 

A  shared  logical  variable  can  also  be  viewed  as  a  shared  location  that  can  be  assigned  a  data 
structure,  i.e.  a  logical  term.  The  term  may  be  compound,  and  its  construction  may  proceed 
incrementally  and  in  cooperation  between  the  processes  initially  sharing  the  variable. 

In  the  special  case  that  the  term  is  a  stream  we  obtain  the  stream  communication  protocols 
described  above.  However,  the  fact  that  stream  communication  results  in  a  data  structure,  and  is 
not  just  a  sequence  of  events  that  occur  in  time,  has  several  ramifications.  Two  of  them  are: 

•  Communication  history  cam  be  kept  am i  later  examined. 

A  shared  variable  used  as  a  communication  channel  is  incrementally  instantiated  to  a  stream 
data  structure,  which  contains  the  messages  and  replies  sent  on  it.  Typically,  a  stream  writer 
or  reader  iterates  with  the  tail  of  the  stream  once  its  head  was  written  or  read,  and  eventually 
the  head  becomes  inaccessible.  Memory  occupied  by  inaccessible  data-structures  is  eventually 
reclaimed  by  garbage-collection.  Alternatively  the  initial  stream  variable  can  be  kept,  either 
by  the  process  communicating  via  the  stream  itself,  or  by  a  concurrent  observer  who  shares 
the  initial  stream  variable.  The  data  structure  kept  by  the  observer  can  be  used  later  for 
various  purposes,  such  as  debugging,  logging,  and  recovery. 

The  stream  data  structure  reflects  only  the  order  in  which  messages  were  sent  and  their 
content,  but  does  not  record  the  order  in  which  message  subterms,  including  replies,  were 
constructed.  In  addition,  if  a  process  communicates  via  several  independent  streams,  their 
content  cannot  be  used  to  determine  the  temporal  relations  between  messages  on  different 
streams. 

For  some  applications  this  abstract  form  of  communication  history,  represented  by  a  stream 
term,  is  sufficient.  If  a  more  precise  history  of  the  computation  is  required,  e.g.,  to  diagnose 
transient  timing  bugs,  a  different  technique  for  recording  information  about  a  computation, 
which  is  sufficient  for  its  accurate  reconstruction,  can  be  used  [118].  This  is  further  discussed 
in  Section  14. 

•  A  message  stream  can  be  inspected  and  transformed. 

A  process  may  examine  or  transform  its  incoming  stream  before  processing  its  messages,  if  it 
so  desires.  A  simple  illustration  of  this  is  the  ability  of  a  process  to  “send  to  self” ,  a  useful 
object-oriented  programming  paradigm.  To  do  so  a  process  prepends  a  message  to  its  input 
stream  and  proceeds  with  the  resulting  stream. 


The  constructed  term  need  not  be  a  stream,  however.  For  example,  in  the  distributed  database 
system  of  Reches  ei  «/.  [144],  a  transaction  is  a  tree-structured  term  that  is  constructed  cooper¬ 
atively  by  the  user  program  and  the  database  system.  The  user  program  constructs  the  term, 
possibly  concurrently,  out  of  terms  corresponding  to  sub-transactions,  leaving  in  it  variables  for 
the  database  system  replies.  The  database  system  consumes  the  term,  executing,  possibly  concur¬ 
rently,  the  subtransactions,  and  instantiates  the  reply  variables  to  the  answers. 

Another  example  is  the  type-checker  of  Yardeni  [211].  In  it,  multiple  processes  cooperate  in 
constructing  a  term  representing  a  finite  automaton  that  defines  the  type  of  the  program  being 
checked.  The  programming  technique  used  is  similar  to  the  multiple- writer  stream  described  in 
Section  15. 
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5.  Basic  Programming  Examples  and  Techniques 


This  section  examines  the  operational  behavior  of  FCP(|)  programs  via  examples,  and  illustrates 
basic  concurrent  logic  programming  techniques. 


Writers  and  readers 

A  writer  process  p(X),  that  unifies  X  with  a  and  halts,  can  be  defined  using  the  single  clause 
program: 

p(X)  —  X  =  a.  %1 

A  reader  c(X)  that  waits  till  X  is  a  and  then  halts  is  defined  by: 


c(s).  %2 

A  computation  starting  from  a  writer  and  a  reader  connected  by  the  variable  X,  progresses  as 
follows: 


<J>W,  C(X);  e) 
<X=a,c(X);£> 
(c(a);  {X>-*a}) 


Reduc^l.l) 

R*duc*{l,sa) 

R«duc«<2,3) 


((rue;  {Xi->a}) 


The  final  state  is  a  success  state,  and  this  is  the  only  possible  computation  from  that  initial  state. 

Many  reader  may  read  the  same  value,  giving  the  effect  of  a  broadcast.  The  computation 
starting  from  the  initial  state: 


<p(X),e(X),e(X) . c(X);  e) 


:  -duces  p(X),  unifies  X  —  a,  and  then  reduces  all  the  c(i)  goal  atoms  one  by  one  in  some  arbitrary 
order. 


Nondeterminism  in  writers  and  readers 

A  process  pj(X)  that  nondeterministically  chooses  to  unify  X  with  s  or  with  l  is  defined  by: 

pi(X)  «-  X=«.  %1 

jm(X)  -  X  =  6.  %2 


There  are  two  possible  computations  if  we  use  the  nondeterministic  writer  p-2  instead  of  V  in 
the  example  above.  A  successful  one,  essentially  identical  to  the  one  above,  and  a  failing  one: 

Reduced, 2) 

(pi(X),  c(X);  e) 


(X=t,  c{X)\  e) 

(C(»);  {XMt}) 


R*duce(l  ,=) 
Fwl(l) 


If  instead  of  c(X)  we  use  a  nondeterministic  reader  cj(X),  which  accepts  either  a  or  i  as  values 
for  X: 


Cj(«). 

C2(t). 

then  instead  of  failing  this  latter  computation  would  proceed  and  terminate  successfully. 

The  process  cj(X)  has  two  alternatives.  Which  one  is  taken  is  completely  determined  by 
its  environment.  pi(X)  has  also  two  alternatives.  However,  the  environment  has  no  effect  on  its 
choice.  An  intermediate  example  is  the  process  cp(X,  Y,Z),  which  behaves  as  follows.  If  X  =  a  it 
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Z  with  s.  If  Y  =  4  it  unifies  Z  with  4.  If  both  X  =  a  and  Y  =  4  it  nondeterminiaticaily  chooses 
one  of  the  two. 

cp(a,Y,Z )  v-  Z=a. 
cp(X,b,Z)  <-  Z=b. 

Starting  from  the  initial  state: 

(n(X),n(Y)MX,Y,zy,t) 

there  are  several  possible  computations,  depending  on  the  choice  of  the  goal  atom  and  the  clause. 
To  focus  on  clause  choices,  assume  that  goal  atoms  are  reduced  from  left  to  right.  Then  there  are 
five  possible  computations.  Three  of  the  four  choices  of  the  two  pa  processes  uniquely  determine 
the  behavior  of  ep(X,Y,Z).  For  example,  if  pa(X )  unifies  Y  =  «  and  pj(  V)  unifies  X  =  a  then 
cp  must  unify  Z  =  s.  If  X  =  4  and  Y  =  s,  then  ep  fails  and  the  computation  fails.  However,  if 
Pl(X)  chooses  X  =  s  and  pa(Y)  chooses  Y  =  4,  then  ep  has  a  choice:  it  can  either  reduce  with 
the  first  clause,  and  unify  Z  =  a,  or  with  the  second,  and  unify  Z  =  4.  Both  computations  are 
possible. 

Streams:  producers,  consumers,  transducers,  distributors,  and  mergers 
As  mentioned  in  Section  3,  a  stream  is  a  list  constructed  incrementally. 

Stream  producers 

Assume  a  process  X  :=  £,  which  evaluates  the  arithmetic  expression  E  when  it  becomes  ground, 
and  unifies  X  which  its  value  (it  can  be  defined  in  FCP(|)  using  more  primitive  arithmetic  processes, 
such  as  plus  shown  in  Section  4.3).  A  process  iuUpen(From,To,Xs),  which,  given  integers  From 
and  To,  produces  the  stream  [From,From  +1,. . .,  To],  can  be  defined  by: 

%  i sieger* (from,  To, Nm)  «—  Ns  is  the  list  of  integers  from  From  to  Tos. 

integers(From,To,Ns)  <—  From  >  To  |  No=[  ]. 
integers(From,To,Ns)  *—  From  <  To  |  Ns— [From  Ins'], 

From'  :=  From  +1, 
integers(From',To,Ns'). 

A  more  interesting  stream  producer  is  fii(N,Ns),  which  produces  the  elements  of  the  Fibbonacci 
aeries  leas  than  or  equal  to  N. 

%  fib(N,Ni)  —  No  is  the  Fibbonacci  series  leas  than  or  equal  to  N. 

fib(N,Ns)  - 

fib'(N,0,l,Ns). 

fib'(N,Ni,Nj,Ns)  •-  N  <  Ni  |  Ns=[  ]. 
fib'(N,Ni,Nj,Ns)  —  N  >  N,  |  Ns=[Ni|Ns'], 

N3  :=  Nj+Nj, 
fib'(N,Nj,N3.Ns'). 

A  process  which  sums  the  elements  of  its  input  stream,  was  defined  in  Section  4.3. 

The  following  process  reads  two  vectors  of  equal  length,  represented  by  streams  of  numbers, 
and  computes  their  inner  product.  It  will  form  the  building  block  of  a  matrix-multiplication 
program,  shown  in  Section  7.2. 

%  ip(Xt,  Yt,S)  «—  S  is  the  inner  product  of  Xt  and  Ys. 

9  The  casssssats  associated  with  concurrent  logic  programs  explain  only  their  logical  reading:  the  reactive  aspects 
are  usually  explained  in  the  text. 
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ip(Xs,Ys,S)  - 
ipl(Xs,Ys,0,S). 

ipl([],n,P,S)-P=S. 
ipl(fX|X.],[Y|Y.],P,S)  - 
P>  :=  P  +  X«Y, 
ipl(Xs,Y»,P',S). 

The  following  proceas  multiplies  its  input  stream  by  some  integer,  to  produce  an  output  stream: 
%  mulUply(In,If,Out)  <— 

Out  is  the  stream  resulting  bom  multiplying  each  element  of  the  stream  In  by  N 

multiply([],N,Out)  —  Out=[  ]. 

multiply([X|In],N,Out) «—  Out  —  [Y|OutT, 

Y:=X*N, 

multiply(In,N,Out')- 

The  following  transducer  filters  all  multiple  of  an  integer  from  a  stream.  It  is  a  building  block  of 
the  parallel  Sieve  of  Eratosthenes,  shown  in  Section  7.2. 

%filter{In,P,Oui)  «— 

Out  is  the  stream  resulting  from  deleting  all  multiples  of  P  bom  the  stream  In. 

filter([X|In],P,Out)  —  0  =/=  X  mod  P  |  Out=[X|Out']> 
filter(ln,P,Out'). 

filter([X|In],P,Out)  -  0  =:=  X  mod  P  | 
filter(In,P,Out). 

filter([  ],P,Out)  u-  Out=[  ]. 

A  stream  distributor 

The  following  stream  distributor  has  one  input  stream  and  two  output  streams.  If  an  input  message 
send{I,X)  is  received  X  is  sent  on  the  first  output  stream,  and  if  sen d(t,X)  is  received  X  it  sent 
on  the  second  output  stream.  A  variant  of  this  program  is  used  in  the  mtg  message  sending  system 
shown  in  Section  7.2. 

%  distribute^ In,  Out\ ,  Oat])  •— 

In  is  a  stream  of  elements  of  the  form  send(i,_)  and  seni(t,~).  Outi  is  the  stream  of  X  ’»  such 

that  seni(l,X)  is  in  In,  and  Out]  is  the  stream  of  X's  such  that  scnd(l,X)  is  in  In. 

diatribute([aend(l,X)|In],0utl,0ut2)  •—  Outl=[X|Outl'], 
distribute(In  ,0u  t  l',0ut2) . 

distribute([send(2,X)|In],Outl,Out2)  <—  Out2=[X|Out2'], 
distribu  te(In,Out  1 ,0ut2') . 

distributed  ],Outl,Out2)  <—  Outl=[  ],  0ut2=[  ]. 

A  deterministic  stream  merger 

The  following  process  receives  two  ordered  lists  of  integers,  and  produces  an  ordered  merge  of 
them.  A  variant  of  it  is  used  in  the  solution  to  Hamming’s  problem  and  in  mergesort  in  Section  7. 

%  omerft(In\,In],Out)  •— 

If  In i  and  In%  are  ordered  streams  of  numbers,  then  Out  is  an  ordered  merge  of  In\  and  In? 

omerge([X|Inl],(Y|In2],Out)  -  X<Y  |  Out=[X|Out'], 
omerge(Inl  ,[Y|In2j,0ut'). 

omerge([X|Inl],[Y|In2],0ut)  -  X>Y  |  Out=[Y|Out'], 
omerge((X|Inl],In2,Out'). 
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omerge([  ],In2,0ut)  *—  In2=0ut. 
omerge(Inl,[],Out)  ♦-  Inl=Out. 


6.  Fairness 

A  Pondeterminifltic  stream  merger 

The  following  is  a  non  deterministic  stream  merger.  Its  output  stream  is  some  order-preserving 
interleaving  of  its  two  input  streams. 

%  r»erpe(/»i,/n2,0«t)  «—  The  stream  Out  is  an  interleaving  of  the  streams  J«i  and  /n?. 

merge([X|Inl],In2,Out)  «—  Out=[X|Ou^, 
merge(Inl,In2,Out'). 
merge(Inl,[X|In2],Out)  ♦—  Out=(XjOut/], 
merge(In  1  ,In2  ,Out') . 
merge([  ],In2,Out)  «—  In2=Out. 
merge(Inl,[  ),Out)  «—  Inl=Out. 

The  nondeterministic  merge  process  thus  defined  guarantees  nothing  about  the  rate  in  which  it 
serves  its  two  input  streams.  In  particular,  if  one  of  the  streams  is  unbounded  then  it  is  possible, 
according  to  the  semantics  of  FCP(|)  defined  in  Section  4  above,  that  only  elements  of  that  stream 
will  be  copied  to  the  output  stream.  Furthermore,  if  the  merger  (or  any  other  process)  is  executed 
in  parallel  with  a  nonterminating  process,  e.g.  p  «—  p,  then  there  is  no  guarantee  that  it  will  reduce 
at  all. 

A  fairness  requirement  states  conditions  under  which  an  event  that  may  happen  must  eventu¬ 
ally  happen.  The  purpose  of  incorporating  fairness  requirements  into  the  definition  of  a  language 
is  to  provide  the  programmer  with  confidence  that  even  in  the  presence  of  nondeterminism  and 
unbounded  computations  certain  program  steps  will  eventually  occur. 

In  concurrent  logic  programming  it  is  useful  to  distinguish  two  types  of  fairness:  And-fairness 
and  Or-fairness.  An  And-fairne$s  requirement  states  conditions  under  which  a  certain  process 
would  eventually  be  reduced;  thus  it  constrains  And-nondeterminism.  An  <9r-/atrne«  requirement 
states  conditions  under  which  a  certain  clause  would  eventually  (not)  be  taken,  thus  constraining 
Or-nondeterminism. 

An  And-fairness  requirement  should  guarantee,  for  example,  that  even  in  the  presence  of  di¬ 
verging  processes,  deterministic  stream  consumers  will  eventually  read  all  their  stream  elements; 
similarly  for  producers.  However,  And-fairness  cannot  provide  such  a  guarantee  for  nondetermin¬ 
istic  consumers  such  as  stream  mergers  or  interrupt  handlers.  This  is  the  purpose  of  Or-fairness 
requirements.  Or-fairness  requirements  should  allow,  for  example,  specifying  a  fair  stream  merger 
and  an  interrupt  handler  in  the  language.  Together,  And-fairness  and  Or-fairness  requirements 
should  allow  one  to  compose  a  controlling  process  and  an  interruptible  process  (e.g.  in  the  style 
of  the  computation  controller  and  the  interrupt-handling  meta- interpreter  shown  in  Section  7) 
and  guarantee  that  the  controller  process  can  interrupt  the  controlled  process  even  if  the  latter  is 
nonterminating.  The  following  fairness  requirements  achieve  this.  Their  definition  can  be  skipped 
without  loss  of  continuity. 

Aad-twnw 

For  simplicity  we  restrict  the  discussion  to  computation*  whose  initial  state  has  a  unit  goal.  Let 
P  be  a  program  and  c  =  (G;c),  ...  be  a  computation  from  the  unit  goal  G.  Let  t  be  the  maximal 
number  of  atom*  in  the  body  of  sny  clause  in  P.  We  lsbel  goal  atoms  in  the  computation  with 
strings  in  {1,. .  The  initial  goal  is  labeled  with  the  empty  string.  Let  A  be  s  process  labelled 
s,  which  is  reduced  using  the  clause  A'  <—  Bu. . .,  J*.  Then  each  of  the  new  atoms  Bi  in  the  new 
goal  is  labeled  with  the  string  sAi. 
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We  extend  the  Reduce  transition  label  to  contain  p,  the  label  of  the  reduced  process,  and  8, 
the  try  substitution  restricted  to  variables  of  the  process  P,  as  in  Reduce(y  ,8).  We  extend  the  Fail 
transition  label  to  contain  the  label  of  the  failing  process  p,  as  in  Fail(p). 

Definition:  And-fairness. 

A  computation  c  is  Aui-frir  if  there  is  no  Reduce(f  ,6)  transition  or  Fail(p)  transition  which  is 
almost  always  enabled  on  the  states  of  e.9  ■ 

Note*: 

1)  Since  the  state  of  a  process  changes  in  a  monotonic  way  (it  can  only  be  instantiated  further), 
if  Reduce(p,6)  or  Fail(p)  are  infinitely  often  enabled,  they  are  also  almost  alwcys  enabled, 
hence  there  is  no  distinction  in  this  case  between  weak  fairness  (also  called  justice)  and  strong 
fairness  [50,142]. 

2)  We  have  defined  the  fairness  condition  by  ruling  out  certain  computations  allowed  by  the 
transition  system.  An  alternative  approach  is  to  define  a  transition  system  which  generates 
only  fair  computations  to  begin  with.  The  approach  of  Costa  and  Stirling  [32]  to  weak  fairness 
in  CCS  can  be  applied  here  as  well. 

Or-fairncss 

In  programs  implementing  stream  merging  and  interrupt  handling,  complete  freedom  in  clause 
selection  would  result  in  undesirable  behaviors:  a  merger  can  ignore  one  of  its  input  streams 
indefinitely;  a  process  may  ignore  a  message  on  its  interrupt  stream  for  arbitrarily  long.  Several 
approaches  to  constraining  clause  selection  in  such  programs  were  suggested;  none  of  them  seems 
completely  satisfactory. 

A  global  fairness  requirement,  which  states  that  all  clauses  in  a  program  satisfying  certain  con¬ 
ditions  should  be  used  eventually  seems  unreasonable,  because  of  the  dynamic  nature  of  processes, 
and  the  fact  that  multiple  processes  may  share  the  same  set  of  clauses.  Therefore  approaches  which 
specify  conditions  on  the  selection  of  a  clause  by  a  process  were  pursued. 

One  approach  is  to  impose  preference  on  clauses,  and  require  the  Reduce  transition  to  select 
the  moat  preferred  clause  among  the  applicable  ones  [160].  Assuming  such  a  preference,  specified 
by  textual  order,  the  following  program  implements  a  fair  merger.  It  achieves  fairness  by  switching 
the  two  input  streams  when  an  element  from  the  preferred  stream  is  read: 

merge([X|Inl],In2,Out)  •-  Out=[X|Out'], 
merge(In2,Inl,Out'). 

merge(Inl,[X|In2],Out) «—  Out=[X|Out'], 
merge(In  1  ,In2  .Out') . 
merge([  ],In,Out)  •—  In=Out. 
merge(In,[  ],Out) «—  In=Out. 

A  process  p(. .  .,/*)  that  responds  to  interrupts  on  the  stream  U  can  be  A  by  placing  the 
clause  testing  for  an  interrupt  first: 

p(. .  ,,[I|Is])  «-  interrupt  Jiandler(. .  .,[I|ls]). 

. . .  other  classes  for  p. . . 

Although  simple  to  define  operationally,  strict  preferences  are  problematic.  From  a  methodological 
point  of  view,  they  destroy, the  clause- wise  modularity  of  programs.  This  may  suggest  awkward 
programming  techniques  (4  Is  red  cuts  in  Prolog  [171]),  and  make  the  life  of  program  analysers, 
transformers  and  compilers  more  difficult.  From  an  implementation  point  of  view,  preferences 
require  strict  synchronisation,  since  a  second  clause  can  be  selected  only  if  nothing  happens  that 
may  enable  the  first  one.  In  fact,  a  correct  distributed  implementation  of  strict  preferences  may 
require  locking  all  the  variables  involved  in  the  reduction  of  the  first  clause  while  attempting  to 

9  i.e.  enabled  on  all  but  a  finite  number  of  the  states  of  C. 
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reduce  with  the  second  clause.  A  weaker  notion  of  preference  seems  more  desirable,  but  it  is  not 
clear  how  it  should  be  defined. 

Another  approach  is  to  use  explicit  conditions  on  clauses.  Assume  a  guard  primitive  nar(X), 
which  succeeds  if  X  is  a  variable,  and  fails  otherwise.  Using  ear,  a  fair  merger  can  be  written  as 
follows,  with  base  cases  as  above: 

merge([X|Inl],In2,Out)  4-  Out=[X]Out'], 
merge(In2,Inl,Out'). 

merge(Inl,[X|In2],Out)  «—  var(Inl)  |  Out=[X|Out'], 
merge(lnl  ,In2,Out') . 

A  process  p(. . .,/«)  which  is  sensitive  to  interrupts  on  Is  can  be  defined  by  adding  to  all  its  clauses 
which  do  not  serve  the  interrupt  the  test  esr(fs).  The  use  of  ear  to  achieve  Or-faimess  has  been 
first  proposed  by  Kusalik  (104].  The  rsr  test  approach  is  better  than  preferences  since  it  does  not 
destroy  clause-wise  modularity,  and  has  no  effect  when  not  used.  Its  drawback  is  that  ear  seems 
too  strong  a  tool  for  this  purpose.  From  a  methodological  point  of  view,  it  offers  opportunities  for 
abuse.  From  an  implementation  point  of  view,  ear,  like  preferences,  implies  tight  synchronisation. 
To  implement  correctly  an  interrupt-sensitive  process  thus  defined,  the  /s  variable  has  to  be  locked 
whenever  a  reduction  of  the  processes  using  a  clause  with  a  os r(/s)  test  is  attempted.  If  there  are 
hundreds  or  thousands  of  such  processes,  all  sharing  the  same  interrupt  stream  Is,  this  would  be 
prohibitively  expensive. 

For  that  reason  a  weaker  primitive,  called  unknovm(X),  was  defined  [191].  Intuitively,  un¬ 
known  is  similar  to  oar,  except  that  its  definition  allows  unknown(X )  to  ‘ignore’  for  some  finite, 
but  bounded,  amount  of  time  the  fact  that  X  was  instantiated.  For  example,  if  X  is  instantiated  to 
a  in  processor  P\,  unknown(X)  can  succeed  after  that  time  in  another  processor  P2  However,  the 
fact  that  X  has  a  value  should  eventually  reach  Pj,  preventing  snknown(A)  tests  from  succeeding 
thereafter. 

In  our  interleaving  based  transition  system,  this  intuitive  definition  is  formalized  as  follows. 
unknovm(X)  behaves  like  ear(X),  except  that  it  may  succeed  only  a  finite  number  of  times  after 
X  becomes  a  non-variable.  In  other  words,  if  in  a  computation  a  variable  X  is  instantiated  to  a 
non-variable  term,  then  the  computation  does  not  have  infinitely  many  transitions  in  which  the 
check  of  the  guard  predicate  unknown(X)  succeeds.  Using  antnoum  instead  of  ear  in  the  above 
programs  would  achieve  the  desired  effect:  the  merger  would  be  fair,  and  the  process  p  would 
eventually  respond  to  an  interrupt.  This  is  achieved  without  heavy  synchronization  costs,  and 
without  giving  the  programmer  too  powerful  a  tool,  since  unknown  (A)  succeeding  iocs  not  imply 
that  X  is  presently  not  instantiated.  In  a  language  without  atomic  variables  (see  I  he  discussion  of 
Flat  GHC  in  Section  10),  the  difference  between  ear  and  unknown  is  immaterial. 

Note  that  unlike  the  other  guard  test  predicates  introduced,  the  success  of  the  ear  and  unknown 
primitives  is  defined  operationally,  without  reference  to  the  notion  of  truth. 


7.  Advanced  Concurrent  Logic  Programming  Techniques 

The  poster  of  concurrent  logic  programming  languages  comes  from  the  wide  range  of  concurrent 
programming  techniques  they  support.  To  convey  this,  we  have  assembled  a  range  of  FCP(|) 
programs  which  demonstrate  these  techniques. 

7.1  Static  process  networks 

Processes  operating  on  streams  can  be  composed  into  networks.  This  section  shows  two  examples 
of  a  static  process  network. 
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A  static  network  of  stream  transducer!:  a  solution  to  Hamming’s  problem 

The  following  program  [42J  solves  the  so-called  Hamming’s  problem  [38]:  generate  an  ordered 
stream  of  all  numbers  of  the  form  2'V5*  without  repetition. 

%hammint(Xt)  «—  Xt  is  the  ordered  stream  of  all  numbers  of  the  form  2*3*5*. 

hamming(Xa)  *— 

multiply  ([1  |Xs]  ,2,X2) , 
multiply([l|Xsj,3,X3), 
multiply([l|Xsj,5,X5), 
omerge'(X2,X3,X23), 
omerge'(X5,X23,Xs). 

where  omergef  is  a  variant  of  omerye  shown  in  Section  5,  which  removes  duplicates  from  its  input 
stream. 

% omcrye'(/nj,/nj,0«<)  «— 

If  Ini  and  Inj  are  ordered  streams  of  numbers,  then  Out  is  an  ordered  merge  of  /ni  and  In? 
with  duplicates  removed. 

omerge'([X|Inl],[X|In2],Out)  —  Out  =  [X|Out']1 
omerge'(Inl,In2,Out'). 

omerge'([X|Inl],[Y|In2],Out)  -  X<Y  |  Out=[X|Out'], 
omerge'(Inl,[Y|ln2],0ut'). 

omerge'([X|Inl],[Y|In2],0ut)  -  X>Y  |  Out=[Y|Out'], 
omerge'([X|Ini],In2,0ut'). 

omerge'([  ],In2,0ut) «—  In2— Out. 

omerge’(Inl,[  ],Out)  «—  Inl=Out. 

multiply  was  defined  in  Section  5. 

A  static  network:  the  MSG  message  sending  system 

The  may  process  network  is  a  simple  message  sending  system  for  two  computer  terminals.  Input 
from  each  of  the  keyboards  Ki  and  Kj  is  a  stream  of  messages,  including  messages  of  the  form 
messaye(X’).  Every  message  M  on  K\  is  echoed  on  S\  as  1M  In  addition,  a  message  of  the  form 
meitage(X)  on  K\  is  also  echoed  on  Sj  as  1  :message(X).  Similarly  for  messages  on  Kg  [93].  The 
program  uses  the  merge  process  and  a  variant  of  the  distribute  process  defined  in  Section  5. 

%msy (KI,Sl,Kt,St)  - 

Si  is  an  interleaving  of  I:X  such  that  X  is  in  K\  and  time— age(X)  such  that  ssessaye(X)  is 
in  Kg .  Similarly  Sj  is  an  interleaving  of  l.X  such  that  X  is  in  Ki  and  1  :me*sagc(X)  such  that 
message(X)  is  in  K\. 

msg(Kl,Sl.K2,S2)  - 

distribute(l,Kl,Kll,K12), 
distribute^, K2,K22,K21), 
merge(Kll,K21,Sl), 
merge(K22,K12,S2). 

distribute(Id,[message(X)|In],Outl,Out2)  <- 
Outl=[ld:message(X)]Outl'], 

Out2=[Id:message(X)|Out2']l 

d»tribute(In,0utl',0ut2'). 

distribute(Id,[X|In],OutllOut2)  — 

X^message(_)  | 

Outl=[Id:X|Outl'],  . 
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distribute(ln,Outl',Out2). 
distribute^  ],Outl,Out2)  «—  Outl=[  ],  Out2=[  J. 

7.2  Dynamic  protean  network* 

We  show  examples  of  dynamic  process  networks  of  various  topologies.  They  are  dynamic  since 
their  size  depends  on  their  input.  Dynamic  process  networks  which  solve  algorithmic  problems 
typically  exhibit  two  phases  of  operation.  A  spawning  phase,  in  which  the  process  network  is 
spawned,  and  a  “systolic”  phase  [103],  in  which  the  processes  in  the  network  perform  both  local 
computations  and  communication.  It  is  interesting  to  note  that  many  of  the  concurrent  logic 
programs  shown  below,  which  implement  “systolic” -like  parallel  algorithms,  are  almost  identical, 
as  lope  programs,  to  Prolog  programs  which  implement  the  corresponding  sequential  algorithms. 
More  on  the  relation  between  systolic  algorithms  and  concurrent  logic  programming  can  be  found 
in  [162]. 

Process  pipes:  linear  process  networks 

The  following  program  is  a  parallel  implementation  of  the  Sieve  of  Eratosthenes  [163],  It  consists 
of  a  process  generating  all  integers  in  the  desired  range,  and  a  set  of  filter  processes,  one  per  prime 
number  found,  which  erase  multiples  of  their  prime  from  the  remaining  stream-  This  program 
overlaps  the  dynamic  construction  of  the  process  network  with  the  computation  of  the  result. 
Its  network  consists  of  a  dynamic  linear  pipeline  of  transducers.  It  uses  the  integers  and  filter 
processes  defined  in  Section  5. 

%  primet(N,P>)  *—  Pt  are  all  the  primes  up  to  N. 

primes{N,Ps)  «— 

integers(2 ,N,Na),  sift(Ns.Ps). 

%  tifl(fft,Ps)  <—  Pa  are  the  numbers  in  Nt  which  are  prime  relative  to  their  predecessors. 

sift([P|Ns],Ps)  -  Ps=[P|Ps'], 

«lter(Ns,P,Ns')1  silt(Nsl  ,Ps'). 
sift([],Ps)-P»=[]. 

Mergesort 

The  following  program  implements  a  parallel  mergesort  algorithm.  Given  a  list  of  length  N  of 
(possibly  singleton)  sorted  lists,  it  forms  s  pipeline  of  length  logjJV  of  msort  processes.  Each 
stage  in  the  pipeline  performs  a  pairwise  order-preserving  merge  of  the  sublists,  using  the  emerge 
procedure  defined  in  Section  5.  Each  stage  doubles  the  length  of  the  sublists  and  divides  their 
number  by  approximately  2.  Using  log  j  AT  processors,  this  program  can  sort  an  N  elements  list  in 
time  O(N).  See  [189]  for  further  discussion  of  the  complexity  of  concurrent  logic  programs. 

%  mergesort  <— 

If  In  is  a  list  of  ordered  lists  of  numbers  then  Oaf  is  a  sorted  list  of  these  numbers. 

mergesort  ([  ],Ys)  <—  Ys=(  ]. 
raergeeort([Xs],Ys)  <-  Xs=Ys. 
mergesort((Xl,X2|Xs],Ys)  «— 
msort([X  1  ,X2|Xs],Zs) , 
mergesort(Zs,Ys). 

msort([Xl,X2|Xs],Ys)  <-  Ys=[YlYs'], 
omerge(Xl,X2,Y), 
msort(Xs,Ys'). 
msort([X],Ys)  -  Ys=[X]. 

™°ft(!  ],Y»)  —  Ys=[  ]. 


-  32  - 


Vector-matrix  multiplication 

A  linear  array  of  ip  proceaaee  can  multiply  a  matrix,  represented  by  a  list  of  vectors,  by  a  vector. 
It  uses  the  ip  process  defined  in  Section  5. 

%  t>m(Xt>,  Ym,Zv )  *-  multiplying  the  vector  Xv  by  the  matrix  Ym  gives  the  vector  Zv. 

vm(-.n.Zv)  —  Zv=t  ]- 

vm(Xv,[Yv|Ym],Zv)  —  Zv=[Z|Zv,J, 
ip(Xv,Yv,Z),  vm(Xv,Ym,Zv/). 

Process  trees 

The  following  program  merges  a  list  of  streams  into  one  stream,  by  creating  a  balanced  tree  of 
binary  merge  processes.  It  uses  the  merge  process  defined  in  Section  6. 

%  mcrger(In,Out)  *—  Out  is  the  merge  of  the  list  of  streams  In. 

merger([X8l,X82|In],Out)  ♦— 

merge  Jayer(  [Xs  1  ,Xs2  (In]  ,OutO , 
merger(Out'  ,Out). 
merger([Xs],Out)  «—  Xs=Out. 

merge  Jayer([Xsl,Xs2|ln], Out)  ♦—  Out=[YslOutf], 
merge(  Xs  1 ,  Xs2 Ys) , 
merge  Jayer(In  .Out') . 
merge  Jay  er([Xs],  Out)  *—  Out=[Xs]. 
merge_layer([  ],Out)  4-  Out=[]. 

Note  that  this  program  operates  correctly  only  if  the  complete  list  of  streams  is  given,  since 
it  emits  elements  from  the  root  of  the  tree  only  after  the  construction  of  the  tree  is  completed. 
If  the  list  is  given  incrementally,  i.e.  it  is  actually  a  stream  of  streams,  then  a  different  approach 
is  needed.  One  naive  solution  is  to  create  an  unbalanced  tree  incrementally,  using  the  following 
program. 

mergerl([Xs|In],Out)  «— 
merge(Xs,Out',Out), 
merger  l(In, Out'). 
mergerl([  ],Out)  ♦-  Out=[  ]. 

The  program  builds  a  linear  tree  top  down.  At  each  point  in  its  construction,  elements  of  the 
input  streams  already  connected  to  me rge  processes  can  reach  the  root  of  the  tree. 

More  sophisticated  balanced  merge  trees  can  be  constructed,  which  support  the  dynamic 
addition  and  deletion  of  merged  streams,  using  the  concept  of  two-three  trees,  as  shown  in  this 
section  below. 

Process  arrays 

Two  matrices  can  be  multiplied  by  an  array  of  ip  processes,  each  computing  the  inner  product  of 
the  appropriate  row  and  a  column  of  the  matrices.  We  assume  that  the  second  matrix  is  already 
transposed. 

%  mm(Xm,Ym,Zm)  *— 

Zm  is  the  result  of  multiplying  the  matrix  Xm  with  the  transposed  matrix  Ym. 

mm([  ],_,Zm)  «—  Zm=[  ]). 
mm((Xv|Xm],Ym,Zm)  *—  Zm=(Zv|Zm/] 
vm(Xv,Ym,Zv),  mm(Xm,Ym,Zm')- 

The  program  uses  the  vm  process  defined  above. 
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The  behavior  of  the  mm  program  is  reminiscent  of  the  well-known  systolic  algorithm  for 
multiplying  two  matrices  [103].  However,  there  are  two  differences.  First,  the  process  network  is 
created  dynamically,  to  fit  the  site  of  the  input  matrices.  This  suggested  the  name10  “soft-systolic” 
to  these  kinds  of  software-oriented  systolic  algorithms,  in  contrast  with  the  classical  hardware- 
oriented  “hard-systolic”  algorithms.  Another  difference  is  that  the  program,  as  specified,  does  not 
pipeline  the  matrices  along  the  connections  between  the  ip  processes,  but  rather  “broadcasts”  each 
row  and  column  to  all  processes  requiring  it.  The  program,  however,  can  be  easily  modified  to  do 
the  pipelining.  See  [163,189]  for  further  discussions  of  the  subject. 

To  achieve  pipelining,  the  mm  program  has  to  be  modified  to  form  direct  connections  between 
adjacent  ip  processes.  A  similar  goal  is  achieved  by  the  following  torus  program.  Given  a  matrix 
Array,  represented  as  a  list  of  list  of  values,  it  spawns  a  torus  celt  processes,  each  with  one  value 
and  with  communication  links  to  adjacent  cell  processes.  The  array  is  augmented  with  end- 
round  connections,  to  form  a  torus  process  network.  This  program  schema  (clichd)  has  several 
applications,  including  array  relaxation  and  the  like. 

torus( Array,. . .)  «— 

torus'(  Array, Bottoms, Tops,. . .), 

Bottoms=Tops. 

torus/([Row[Array], Bottoms, Tops,. .  )  «— 

row(Row,Left, Right, Bottoms.Middles,. . .), 

Left = Right, 

torus;(  Array, Middles, Tops,. . .). 

torus' ([  ], Bottoms, Tops)  «— 

Bottoms=Tope. 

row  ([Element  [Row],  Left,  Right,  [Bottom|  Be],  [Top|Ts],. .)  •— 
cell  (Element,  Left,  Middle,  Bottom, Top,. . .), 
row(Row, Middle, Right, Bs,Ts,. .  ). 

row([  ], Left, Right, [],[ ),. . .)  — 

Left=Right. 

eel  1( Element, Left, Right, Bottom, Top, . . .)  «— 

An  application  specific  program. 

The  layered  stream  method 

Search  problems  that  are  amenable  to  depth-first  search  have  elegant  and  efficien*  solutions  in 
Prolog.  Assume  that  the  solution  to  the  search  problem  is  in  the  form  of  a  list  of  elements,  which 
satisfy  some  consistency  criterion.  The  incremental  construction  of  a  solution  in  Prolog  often  relies 
on  its  backtracking  mechanism,  where  forward  computation  consists  of  extending  some  prefix  of  a 
solution,  and  backtracking  occurs  when  it  is  discovered  that  the  prefix  cannot  be  extended  further, 
either  because  of  inconsistencies  or  because  it  is  a  complete  solution  and  additional  solutions  are 
required. 

The  layered  stream  data  structure,  proposed  by  Okumura  and  Matsumoto  [139],  allows  an 
incremental  and  parallel  construction  of  solutions  to  search  problems  without  relying  on  a  Prolog¬ 
like  backtracking  mechanism.  A  layered  stream  represents  a  set  of  lists  sharing  a  common  head  as 
the  cross-product  of  the  head  and  the  list  of  tails.  The  list  of  tails  in  turn  can  be  represented  by  a 
layered  stream.  For  example  the  lists  [1,2,4],  [1,2,8],  [f,$,0]  are  represented  by  the  layered  stream 
1  *  [  [2,4] ,  [2,8],  [5,0]  ],  and  also  by  the  layered  stream  /  •  [  5*(W].[5]].  [5,9]  J  The  function 
symbol  V  is  used  for  mnemonic  purposes,  but  of  course  any  other  function  symbol  would  do.  The 
product  A*[]  represents  the  empty  set  of  lists,  and  the  product  X*trve  represents  X . 

The  following  programming  methodology  is  associated  with  the  layered  stream.  Suppose  the 

10  Due  to  VijajF  A.  Smawit. 
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problem  is  to  find  values  for  N  variables  ranging  over  some  finite  domain  of  values  V .  N+\V\ 
processes  are  initially  created,  one  for  each  possible  value  of  each  variable.  Denote  the  process 
associated  with  value  v  of  variable  a  by  pn,p.  Each  p*,*  process  receives  an  input  stream  of 
partial  solutions  which  consist  of  an  assignment  to  variables  1 f ,  and  produces  an  output 
stream  of  partial  solutions  obtained  by  extending  each  input  assignment  with  the  assignment  of  v  to 
variable  n,  provided  the  resulting  assignment  is  consistent.  That  is,  given  the  input  partial  solution 
sj,. .  if  the  extended  partial  solution  vi,. .  is  consistent,  it  is  output.  Otherwise  it 

is  not. 

The  layered  stream  data  structure  allows  all  processes  Pn,v,  *€  V,  to  share  the  same  input 
partial  solutions,  thus  saving  space  and  hence  also  time.  It  allows  the  pipelining  of  partial  solutions, 
hence  increases  the  available  parallelism.  An  example  of  a  search  program  using  &  layered  stream 
is  the  four  queens  program  shown  below  [139].  The  program  easily  generalizes  to  N  queens  by 
replacing  the  explicit  construction  of  the  filter  processes  by  iterative  procedures  to  do  so. 

%/o«r_g«eens((Js)  «— 

Qs  is  a  layered  stream  of  all  legal  assignments  of  four  queens  on  a  4*4  board. 
four_queens(Qs)  «— 

queen(true,Qsl),  queen (Qsl,Qs2),  queen (Qs2,Qs3),  queen(Q»3,Qs). 

%  q*ccn(ln,Out)  «— 

If  In  represents  the  set  of  legal  assignments  of  N  queens  on  a  4  x  4  board,  Out  represents  the 
set  of  legal  assignments  of  N  +  1  queens  on  a  \  x4  board. 

queen(In,Out)  ♦— 

filter(In,l,l,OutI),  filter(In,2,l,Out2),  filter(In,31,Out3),  filter(In,4,l,Out4), 
Out=[l*Outl,2*Out2,3*Out3,4*Out4]. 

%fHier{In,I,D,Oni)  — 

If  /n  is  a  set  of  assignments  of  N  queens  to  consecutive  columns,  0*1  is  the  set  of  assignments  of 
N+l  queens  obtained  as  follows:  Extend  each  assignment  in  In  with  a  queen  on  the  next  column 
and  on  row  I.  If  the  added  queen  does  not  interfere  with  the  previous  queens,  incorporate  the 
extended  assignment  in  0%i. 

filter( true, - Out)  ♦—  Out=true. 
filter([  Out)  ♦-  Out=[  ]. 

filter([I*_|In],I,D,Out)  ♦—  filter(In,I,D,Out).  %  Same  row 

fiIter([J*_fIn),J,DrOut)  ♦—  D=:=abs(I~J)  |  fiiter(In,f,D,Out).  %  Same  diagonal 
fiJter([J*In'JIn],I,D,Out)  ♦—  l^J,  D\=:=abs(I-J)  |  %  No  interference 

D/:=sD-fl, 
filter  ( In',  I,  D',  Out'), 
filter(ln,I,D,Out#/), 

Out=|J*Out'|Out"]. 

The  answer  obtained  from  the  goal  fonr^queens(Qs)  is  the  following  layered  stream: 

Q.  =  11  *  t3  .  [  1.  4  .  [2  .  [ ,]]. 

2  •  [4  .  [1  «  [3  •  true]]], 

3  *  [1  *  [4  *  [2  •  true]]], 

4  *  [1  *  [3  *  [  ]],  2  *  [  ]]] 

which  represents  the  list  of  lists: 

Qs  =  [[2,4, 1, 3],  [3,1, 4, 2]]. 

A  comparison  of  the  sequential  and  parallel  performance  of  this  program  with  other  concurrent 
logic  programs  and  Prolog  programs  for  the  -queens  problem  is  given  by  Tick  [193], 
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7.3  Incomplete  message  protocola 

Incomplete  message  protocols  were  reviewed  in  Section  4.4.  Here  we  show  several  examples  of  their 
application.  The  first  application  is  monitors.  A  monitor  is  a  process  that  maintains  some  local 
state,  and  serves  requests  to  inspect  and/or  modify  the  state.  It  is  called  so  since  its  function  is 
similar  to  Hoare’s  original  concept  of  a  monitor  [85].  Clients  of  a  monitor  are  typically  connected 
to  it  via  a  merge  network,  and  communicate  with  it  using  incomplete  messages. 

Perhaps  the  simplest  monitor  is  the  counter,  which  maintains  a  local  counter,  and  reponds  to 
the  messages  clear,  add ,  and  read(X),  the  last  of  which  is  an  incomplete  message. 

%  co«nter(/n)  ♦— 

In  is  a  stream  of  clear ,  add ,  and  read(X)  such  that  X  is  the  number  of  add* o  since  the  most 
recent  clear. 

counter(In)  ♦—  counter^In.O). 

counter'([clear|In],C)  «—  counter/(ln,0). 
counter'([add|In],C)  ♦—  C '  :=  C*f  1,  counter'(In,C'). 
counter'([read(X)|In),C)  ♦—  X=C,  counter'(In,C). 
countei'((  ],C). 

The  client  of  a  counter  who  wishes  the  know  its  value  sends  it  the  message  read(X),  and  waits  for 
X  to  be  instantiated. 

Shared  queues 

A  more  sophisticated  monitor  is  the  following  queue  process.  It  serves  requests  of  the  form  en- 
queue(X)  and  deque ue(X),  by  unifying  the  arguments  of  corresponding  enqueue  and  dequeue  re¬ 
quests,  and  maintaining  arguments  of  superfluous  requests.  While  a  list  is  a  natural  data  structure 
for  representing  a  stack,  a  difference-list  is  most  convenient  for  representing  a  queue.  The  argu¬ 
ments  of  superfluous  requests  are  maintained  in  a  difference-list  data-structure,  which  is  positive  if 
it  received  more  enqueue  than  dequeue  requests,  empty  if  the  number  of  requests  received  of  each 
type  were  equal,  and  negative  otherwise.  A  difference-list,  explained  in  Section  2,2,  is  a  common 
data-structure  both  in  Prolog  and  in  concurrent  logic  programming  languages  [27,160,171]. 

%  queue(In)  *— 

In  ia  a  stream  of  enqueue(X)  and  dequeue(X),  for  which  the  list  of  X’s  such  that  enqueue(X) 
is  in  In  is  identical  to  the  list  of  X’s  such  that  dequeue(X)  is  in  In. 

queue(In)  ♦- 

queue' (In  ,Q\Q). 

queue'([dequeue(X)|In],H\T)  *—  H=[X|H'], 
queue'(In,H'\T). 

queue'((enqueue(X)(In],H\T)  ♦—  T=[X|T'], 
queue'(In,H\T'). 
queue'([  ),H\T). 

Another  useful  monitor  is  a  priority  queue.  A  priority  queue  has  two  input  streams.  One  of 
enqueue  requests  of  the  form  enqueue(X,P ),  and  one  of  dequeue  requests  of  the  form  dequeue(X). 
It  maintains  an  internal  priority  queue,  which  is  a  list  of  the  elements  enqueued  but  not  dequeued, 
sorted  by  their  priority.  It  serves  a  dequeue  request  only  if  the  queue  is  non-empty. 

%pqueue(E*,Da)  ♦— 

Ea  is  a  list  of  en queue{X,P)  and  Da  is  a  list  of  dequeue(Y)  for  the  which  the  corresponding 
multisets  of  X  ’ a  and  Y'a  are  the  same,  and  if  dequeut(X)  precedes  dequeue(  Y )  in  Da  then  either 
enqueue(X,PX)  precedes  enqueue(  Y,PY)  in  Ea  or  enqueue(Y,PY)  precedes  enqueue(X,PX)  in 
Ea  and  PY<PX. 
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pqueue(Es,Ds)  <— 

pqueue/(Es,Ds,[  ]). 

pqueue'([enqueue(X,P)|Es],Ds,Q)  «— 
ineert(X,P,Q,Q')l 
pqueue'(Es,Ds,Q'). 

pqueue'(Es,[dequeue(X)|De],[(Y,P)|Q])  -  X=Y, 
pqueue'(Ed,Dt,Q). 
pqueue'([],[],Q). 

ineert(X,P,[],Q)  -  Q=[(X,P)]. 

imerttX.P.KX'.POlQl.QO  - 
P  <  P'  I  Q'=[(X,P),(X',POIQ]. 
iMert(X,P,[(X',P')|Q],QO  - 
P  >  P'  |  Q'=[(X',P')|Q'T, 
inaert(X,P,Q,Q"). 

Merge  trees 

Another  application  of  incomplete  message  protocols  is  network  reconfiguration.  We  show  here  a 
simple  example  of  a  dynamic  two-three  merge  tree.  A  process  p{Xs,. . .)  with  a  stream  Xa  to  the 
tree  can  create  a  new  process  q(  Yat. . .)  with  a  stream  Ya,  and  join  Y»  to  the  tree  by  sending  down 
Xa  the  message  mergt{  Ya).  For  example: 
p(X»,. . .)  — 

Xs=[merge(Ys)|Xs'], 

p(X^,...), 

q(Y»,-  •  •)• 

A  balanced  tree  of  merge  proeeasea  capable  of  handling  iuch  meaaagea  can  be  compoaed  of  binary 
and  ternary  mergers,  defined  below.  We  ahow  the  dauaea  which  handle  meaaagea  on  the  first 
input  stream  only.  Handling  meaaagea  on  the  other  streams  is  done  by  similar  dauaea  of  the  same 
procedure.  Note  how  a  merged  process  that  receives  a  merye(X)  message  turns  into  a  merged, 
and  a  merged  process  that  receives  such  a  message  turns  into  two  merged ’s,  and  sends  up  another 
merge  message. 

merge2([X|Xs],Ya,Zs)  «— 

X^merge(_)  |  Zs=:[X|Zs'], 
merge2(Xs,Ys,Zs'). 
merge2([merge(Ws)|Xs],Ys,Zs)  «— 
merge3(W»,Xs,Ys,Zs). 
merge2([],[],Za)  —  Zs=[  ]. 

merge3([W|Ws],Xs,Ys,Zs)  - 
W#merge(_)  j  Zs=fW|Zs'], 
merge3(Ws,Xs,Ys,Zs'). 
merge3([merge(Wsl)|W8],Xs,Ys,Zs)  •— 

Zs=[merge(Zsl)|Zs'], 
merge2(  Wsl ,  Ws.Zsl ) , 
merge2(Xs,YS,Zs'). 
mer8e3([  ]>[).[  !iZs)  -  *•=[]- 

The  merge  tree  described  grows  dynamically,  but  does  not  shrink  in  a  balanced  way.  An  extension 
based  on  a  distributed  variant  of  the  two-three  tree  deletion  algorithm  is  described  in  [167). 

The  bounded- buffer  protocol 

In  several  situations  it  is  desirable  to  allow  the  reader  of  a  stream  some  degree  of  control  over  its 
writer.  Examples  are  when  the  reader  is  much  slower  than  the  writer,  and  when  only  some  prefix 
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of  the  produced  stream  is  required,  but  its  sise  can  only  be  determined  by  the  reader  at  runtime. 
The  bounded-buffer  protocol  [177]  employs  difference  lists  and  incomplete  messages  to  realize  this 
kind  of  control. 

The  idea  of  the  bounded-buffer  protocol  is  simple:  the  controlling  reader  process  maintains  a 
difference-list  B\T  of  incomplete  messages,  say  of  the  form  metttge(X),  where  X  is  a  variable. 
The  difference-list  represents  the  “buffer” .  After  the  buffer  is  initialised  to  a  list  of  n  incomplete 
messages,  the  reader  operates  as  follows:  when  it  is  ready  to  process  tbe  next  message,  it  waits 
until  the  first  element  in  the  buffer  is  known,  i.e.  H=[me»iage(X)\H'],  where  X  is  known,  dequeues 
it,  and  enqueues  an  incomplete  message,  men age(X'),  at  the  tail  of  the  difference  list.  When  it 
does  not  desire  to  receive  any  further  messages  it  unifies  the  tail  with  nil.  What  to  do  in  such  a 
case  with  the  messages  pending  in  the  buffer  is  application  dependent. 

The  producer  is  given  initially  the  head  of  the  difference  list  as  its  input  stream.  It  then 
operates  as  follows.  It  waits  until  its  input  stream  has  the  message  mcnage(X),  produces  the  next 
element,  unifying  it  with  X,  and  iterates  with  the  tail  of  its  input  stream.  It  terminates  when  its 
input  stream  is  nil. 

Schematic  programs  for  the  producer  and  the  reader  are  shown  below. 

bounded-buffer_network(. . .)  «— 
buffer(n,H\T), 
read(H\T,. . .), 
produce(H,. . .). 

%  b*ffer(N,H\T)  - 

H\T  is  a  difference  list  of  mentge(-)  of  sise  N . 

buffer(0,H\T)  -  H=T. 

buffer(N,H\T)  — 

N>0  | 

N':=N-1,  T=[message(_)|TT, 
buffer(N',H\r). 

read([message(X)jH]\T,. . .)  «— 
known(X), 

. . ./  want  more  X’s. . .  | 

T=[message(_)|T'], 

. . .  process  X. . 
read(H\r 

read(H\T,. . .)  - 

. . ./  don’t  want  more  X’s. . .  | 

T=[], 

. .  .process  remaining  messages  in  H. . .. 

produce([message(X)|In],. . .)  <— 

. .  .produce  X. . ., 
produce(In,. . .). 

produce([  ],. . .). 

Several  variations  on  this  protocol  are  possible.  For  example,  it  is  not  necessary  for  the  reader  to 
maintain  a  fixed  size  buffer:  it  can  increase  or  decrease  the  sise  of  the  buffer  if  it  so  desires.  It 
is  not  necessary  to  synchronize  on  every  message:  a  more  efficient  protocol  might  be  to  produce 
k  stream  elements  per  incomplete  message,  or  to  provide  a  parameter  in  the  incomplete  message, 
specifying  how  many  more  elements  to  produce.  Finally,  it  is  possible  for  the  incomplete  message 
to  be  simply  a  variable,  rather  than  a  term  containing  a  variable. 
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7.4  Mutual  exclusion  protocols 

Mutual  exclusion  can  be  achieved  in  FCP(|)  using  the  following  mechanism.  The  set  of  processes 
participating  in  the  mutual  exclusion  protocol  are  connected  via  a  merge  network  into  a  mutex 
process.  A  single  round  mutual  exclusion  protocol  is  ss  follows:  all  processes  competing  for  lock 
send  a  iock(Replg)  incomplete  message  to  raster,  raster  grants  the  first  lock  request  received  by 
unifying  Rtplg  —  granted ,  and  denies  the  other  requests  by  unifying  Reply  =  denied.  It  is  defined 
as  follows: 

%  mutex(In)  «—  In  is  a  list  containing  one  lock(granted)  followed  by  zero  or  more  loek(denied). 

mutex(pock(Reply)|ln])  «—  Reply =gran ted, 
mutex'(In). 

mutex,(pock(Reply)|In])  «—  Reply=denied, 
mutex/(In). 
mutex'([  ]). 

The  single- round  mutual  exclusion  protocol  can  be  used  to  simulate  CSP  with  input  guards  [87]. 
A  simulation  of  CSP  with  both  input  and  output  guards  is  discussed  in  Section  14. 

A  multiple  round  mutual  exclusion  protocol  is  only  slightly  more  complex.  Instead  of  sim¬ 
ple  back-communication,  it  uses  a  three  stage  dialogue:  the  process  requests  the  lock,  then  ma¬ 
ter  grants  it,  then  the  process  releases  the  lock,  and  mutex  serves  the  next  lock  request.  Pro¬ 
cesses  competing  for  permission  send  lock(Replg)  as  before,  raster  answers  the  first  by  Rtplg  = 
granted(Done),  and  waits  for  Done  =  done.  When  the  process  to  which  the  lock  was  granted  ends 
it  critical  operation,  it  releases  the  lock  by  unifying  Done  =  done,  mutex  then  grants  the  next 
lock,  and  so  on.  If  the  merge  network  is  fair,  and  every  process  that  is  granted  a  lock  eventually 
releases  it,  then  every  lock  request  will  eventually  be  granted. 

The  definition  of  the  multiple- round  msfex  process  is  as  follows.  Its  trivial  logical  reading 
indicates  that  its  interest  lies  in  its  reactive  aspects  only. 

%m«<er(/n)  «—  7n  is  a  list  of  lock(granted(done)). 

mutex(In)  «— 

mutex'(In,done). 

mutex,(pock(Reply)|In],done)  «—  Reply =granted( Done), 
mutex/(In,Done). 
mutex'([ 

A  program  schema  for  *  perpetual  process  p  participating  in  a  multiple-round  mutual  exclusion 
protocol  is  shown  below.  We  assume  that  initially  its  first  argument  is  a  stream  merged  to  msfci; 
other  arguments  are  application  specific. 

p(ToMutex,. .  J  «—  p_requeat(done, ToMutex,. . .). 

p.request(done,ToMutex,. . .)  •—  ToMutex=[lock(Reply)|ToMutex^, 
p.wait(  Reply, ToMutex',. . .). 

P-wait(graated(Done),ToMutex',. . .)  «- 

...  io  critical  operation;  token  done,  nnify  Done=done. . . 
p_request( Done, ToMutex,. . .). 

7.5  Short-circuit  protocols  for  distributed  termination,  quiescence  detection,  and  distributed 
event-driven  simulation 

The  problems  of  distributed  termination  detection  and  quiescence  detection  have  received  consid¬ 
erable  attention  [15,16,40,51,105,126],  In  concurrent  logic  programming,  these  problems  have  very 


-  39  - 


elegant  solutions,  using  tlie  short-circuit  protocol.  The  protocol  is  originally  due  to  Takeuchi  [175], 
and  was  later  extended  by  Weinbaum  and  Shapiro  [205]  and  Sarsswat  et  a I .  [158];  we  largely  follow 
[158]  in  the  following  discussion.  The  underlying  behavior  of  implementations  of  this  protocol  are 
closely  related  to  that  of  distributed  termination  and  quiescence  detection  algorithms  based  on  dis¬ 
tributed  counters  [105,126].  We  do  not  know  of  algorithms  for  distributed  event-driven  simulation 
corresponding  to  the  one  based  on  the  short-circuit. 

Distributed  termination  detection 

The  idea  of  the  short  circuit  for  termination  detection  is  as  follows.  Call  the  computation  whose 
termination  should  be  detected  the  underlying  computation,  and  the  program  it  executes  the 
underlying  program.  Augment  each  process  participating  in  the  underlying  computation  with  two 
additional  arguments,  called  Left  and  Right.  For  readability,  these  arguments  are  typically  packed 
in  one  term  using  the  infix  function  symbol,  as  in  Left-Right.  The  pair  is  called  a  twitch.  It  is 
closed  if  Left— Right,  open  otherwise. 

Initially,  connect  all  processes  in  a  chain,  by  unifying  Left  of  the  i**  process  with  Right  of  the 
tth+l  process.  The  Right  of  the  first  process  and  the  Left  of  the  last  process  are  called  the  ends 
of  the  short-circuit.  For  n  processes,  the  chain  contains  a  open  switches. 

•  Each  process  in  a  computation  operates  as  follows.  If  it  halts  it  unifies  its  Left  and  Right 
variables.  If  it  iterates  it  leaves  them  unchanged.  If  it  creates  n  new  processes,  it  extends  the 
short-circuit  by  n-1  intermediate  links.  This  behavior  is  achieved  by  transforming  the  clauses 
of  the  underlying  program  along  the  following  schema,  where  '. . .’  denotes  underlying  program 
arguments. 

p(.  .  .).  =*■  p(.  .,L~R)  -  L=R. 

P(  )-  =»•  P(  „L-R)  - 

p'(...).  p'(..,Hl). 

P(-)~  =*  p(  ,L-R)  — 

Pi(-),  pi(. .  .,L-Xi), 

Pr(-  ■  ■),  Pa(-  •  ,X,-X3), 

Pn(-  •  •)•  Pn(-  •  •.Xw-l-R). 

In  FCP(|),  a  correct  use  of  the  short-circuit  requires  threading  it  to  the  equality  goal  atoms  in  a 
special  way.  If  the  underlying  program  has  a  body  atom  T1=TS,  the  transformed  program  should 
have  the  atom  (Left, Tl)— (Right, Tt)  for  the  appropriate  switch  variables  Left  and  Right,  so  that 
the  switch  would  not  close  before  the  underlying  unification  completes. 

The  invariant  of  the  short  circuit  under  this  behavior  is  that  the  number  of  open  switches 
is  identical  to  the  number  of  processes  in  the  computation.  In  particular,  all  switches  are  closed, 
which  implies  that  the  two  ends  of  the  initial  chain  are  identical,  if  and  only  if  all  processes  in  the 
computation  have  terminated,  which  k  a  stable  property. 

Any  process  wishing  to  detect  that  the  computation  has  terminated  is  given  the  initial  ends 
of  the  short  circuit.  Assume  the  termination  detecting  process  is  called  holted(Left-Right,. . .).  It 
can  be  implemented  in  FCP(|)  in  two  ways: 

ha]ted(X-X,. report  termination  . . . 


halted(Left-Right,. . .)  «— 

Lefl=done,  waitJbrJone(Right,. . .) 
w ait  Jor.donef done,  report  termination  . . . 
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Distributed  phased  termination  detection 

Some  computations  consist  of  phases,  "where  a  process  is  allowed  to  begin  computations  of  the 
next  phase  only  if  all  processes  have  completed  the  previous  phase  [130,208].  The  short  circuit 
can  be  generalised  to  achieve  phased  termination  detection  as  well.  Instead  of  having  one  short 
circuit,  a  stream  of  short  circuits  is  threaded  through  the  underlying  computation.  Each  process 
is  augmented  with  a  Left-Right  switch  as  before,  and  with  the  original  left  and  right  ends  of  the 
circuit,  LefiEnd-RightEnd .  However,  instead  of  unifying  Left  and  Right  upon  termination,  it  treats 
Left  and  Right  as  streams.  At  the  termination  of  a  phase  it  unifies  the  head  of  Left  with  the  head 
of  Right.  Following  that,  it  waits  for  the  heads  of  LeftEnd  and  and  RightEnd  to  be  identical  before 
it  proceeds  with  the  next  phase.  This  is  achieved  by  the  following  iterative  schema: 

p(Left-Right,LeftEnd-RightEnd,. . .)  «— 

...  do  computation  of  this  phase,  when  done,  do  the  following  . . ., 

Left=(X|Left'], 

Right=[X|Right1, 

p.wait(Left'-Right',LeftEnd-RightEnd,. . .). 

p-wait(Left-Right,[X|LeftEnd]-[X|RightEnd],. . )  — 
p(Left-Right,LeftEnd-RightEnd,. . .). 

Process  creation  and  termination  is  handled  as  before. 

Note  that  the  solution  is  completely  symmetric.  There  is  no  centralised  process  that  detects 
the  termination  of  a  phase;  rather,  the  ends  of  the  circuit  are  distributed  to  all  processes,  and  each 
of  them  detects  the  end  of  phase  independently. 

Quiescence  detection 

Consider  a  network  of  processes  participating  in  some  underlying  computation  by  exchanging 
messages.  The  computation  begins  by  a  designated  process,  which  sends  one  or  more  messages  to 
other  processes.  Each  process  that  receives  a  message  sends  out  seio  or  more  messages  in  response. 
No  process  spontaneously  initiates  new  messages.  The  computation  ends  when  all  messages  sent 
have  been  received,  and  no  new  response  messages  need  to  be  generated.  Normally,  this  results  in 
a  deadlock  of  the  underlying  computation.  We  would  like  to  augment  the  underlying  computation, 
so  that  instead  of  deadlocking  it  would  report  quiescence  [15,16]. 

This  can  be  achieved  by  another  variant  of  the  short  circuit  protocol.  In  this  variant,  switches 
are  embedded  in  messages,  rather  than  in  processes.  The  initial  set  of  messages  are  threaded  with 
a  short  circuit,  as  was  the  initial  set  of  processes  above.  A  process  wishing  to  detect  quiescence 
holds  the  ends  of  the  circuit  and  waits  for  them  to  become  identical.  Each  message  in  the  under¬ 
lying  computation  is  augmented  with  a  switch,  and  each  process  in  the  underlying  computation 
is  augmented  to  obey  the  following  protocol.  When  it  absorbs  a  message,  i.e.  receives  a  message 
without  generating  any  additional  messages  in  response,  it  closes  the  switch  in  the  message.  When 
it  sends  one  message  in  response  to  a  message,  it  includes  in  the  outgoing  message  the  switch  of 
the  incoming  message,  intact.  When  it  generates  n  response  messages,  *>1,  it  extends  the  switch 
into  a  switches,  and  embeds  the  new  switches  in  the  outgoing  messages. 

For  simplicity,  assume  that  each  process  has  one  input  stream  and  one  output  stream  of 
messages.  Mergers  and  distributers  can  be  attached  to  these  streams  if  necessary.  The  schema  of 
an  augmented  process  is: 

p([m(Left-Right,.\  .)lIn],Out,. . .)  •-  %  Absorb  a  message 

Left=Right, 

p(In,Out,. . .). 

p([m(Left-Right,. .  )|In],Out,. . .)  •—  %  Send  one  message 

Out=[m'(Left-Right,. .  )|Out'], 
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p(In,Out',. . .). 

p([m( Left-Right,. .  .)|In],Out,. %  Send  many  messages 

Out=[m1(Left-Middl«i,. . .), 
mj(Middl«i-Middlej,. . .), 

m.(Middlen_i-Right,  .  •)|Out'], 
p  (In, Out,. . 

The  invariant  of  this  protocol  is  that  the  number  of  open  switches  is  the  number  of  message  sent 
(or  to  be  sent)  but  not  yet  received.  When  this  number  reaches  0,  the  short  circuit  is  closed, 
and  quiescence  can  be  repotted.  Note  that  this  protocol  requires  that  each  message  has  at  most 
one  receiver.  To  achieve  broadcasting  the  underlying  program  must  be  augmented  with  explicit 
distributors,  which  follow  the  same  protocol. 


Distributed  event-driven  simulation 

One  interesting  application  of  the  above  techniques  is  distributed  event-driven  simulation.  In 
event-driven  simulation,  in  contrast  to  clock-driven  simulation,  only  changes  are  communicated 
between  the  components  participating  in  the  simulation.  This  is  especially  important  in  hardware 
simulation,  where  very  often  only  a  small  percentage  of  the  simulated  device  is  active  at  any  given 
time. 

An  event-driven  simulation  is  phased,  since  changes  which  occur  in  the  next  phase  can  be 
reliably  communicated  only  when  all  changes  related  to  the  previous  phase  have  been  received. 
The  method  for  phased  termination  detection,  using  the  stream  of  short  circuits  described  above 
could  be  used,  except  that  it  requires  every  process  participating  in  the  simulation  to  be  activated 
in  every  cycle  in  order  to  dose  its  segment  of  the  short-circuit,  contrary  to  our  goal.  Our  solution 
is  a  combination  of  the  quiescence  detection  and  phased  termination  detection  techniques. 

Each  message  is  augmented  with  a  stream  of  switches  and  the  ends  of  the  short  circuit;  these 
are  the  same  data  structures  each  process  is  augmented  with  in  phased  computation  detection.  In 
addition,  each  process  is  augmented  to  behave  as  follows.  In  each  phase,  the  process  treats  the  first 
message  it  receives  ss  follows.  It  closes  the  head  of  its  switch,  and  keeps  the  tail  of  the  switch  and 
the  circuit's  ends.  It  then  waits  either  for  the  head  of  the  circuit’s  ends  to  close,  or  for  additional 
messages.  (Note  that  only  one  of  them  can  occur,  since  the  bead  of  the  circuit’s  ends  close  only 
after  all  messages  sent  in  this  phsse  have  been  received.)  If  an  additional  message  is  received,  it 
cloaea  the  message’s  entire  switch,  after  verifyng  that  the  message-circuit’s  ends  are  identical  to 
the  ones  it  maintains  (this  is  necessary  to  ensure  that  the  message  belongs  to  the  current  phase; 
otherwise  it  is  passible  that  this  message  was  sent  by  a  process  that  haa  already  detected  the  end 
of  the  current  phase  and  sent  a  message  belonging  to  the  next  phase).  If  the  head  of  the  circuit’s 
ends  dose,  it  sends  sero  or  more  messages,  as  required  by  the  underlying  computation,  each  with 
a  segment  of  the  tail  of  the  switch,  snd  with  the  tail  of  the  circuit’s  ends. 

A  schema  of  such  a  process  follows.  For  simplicity  a  process  which  sends  out  one  message  per 
phase  is  shown. 


pjdormant([m( Left-Right, LeftEnd-RightEnd,. .  )|In},Out,. . .)  — 
Left=(X|Left'], 

Right=[X|Right'), 

*  *  *1  ' 

p-passive(ln, Out, Left'-Right', LeftEnd-RightEnd,. . .). 
P-passive((ra(Leftl-Rightl, LeftEnd-RightEnd,. .  )|In], 

Out, Left-Right, LeftEnd-RightEnd,. . .)  <— 
Leftl=Rightl, 


%  received  first  message 
%  acknowledge  receipt 

%  process  and  store  message 

%  received  additional  message 
%  of  current  phase 
%  acknowledge  receipt 
%  process  and  store  message 
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p_pa**ive(In, Out, Left-Right, LcftEnd-RightEnd,. . .). 
p_p«e«ive(ln, Out, Lett-Right, [X|LeftEnd)-pC[RightBnd],. . )  «—  %  detect  end  of  phase 

. . .,  %  compute  outgoing  mcaeage 

Out=[m(  Left-Right, LcftEnd-RightEnd,. .  ,)|Out'], 
p_dormant(In,Out/,. . .). 

The  reason  for  embedding  the  circuit's  ends  in  messages  is  efficiency.  If  the  ends  were  distributed 
to  all  processes  in  the  network  initially,  a  process  receiving  a  message  after  being  dormant  for  some 
time  would  have  to  search  for  the  tail  of  the  end’s  streams.  In  the  current  scheme  it  receives  the 
updated  tails  in  the  message. 

More  details  on  this  subject  can  be  found  in  [158,208]. 

7.6  Object-oriented  programming,  delegation,  and  otkerwite 

Concurrent  logic  programming  languages  naturally  give  rise  to  an  object-oriented  programming 
style,  where  the  objects  are  processes  communicating  via  message  streams.  Much  research  was 
devoted  to  understanding  the  relation  between  classical  object-oriented  concepts  and  techniques 
and  the  object-oriented  style  offered  by  concurrent  logic  programming  [95,169].  For  a  further 
discussion  of  object-oriented  programming  see  Section  21. 

One  common  object-oriented  technique  is  delegation.  A  process  that  does  not  understand  a 
certain  message  delegates  it  to  another  process,  who  may  be  better  equipped  to  handle  it.  Consider 
a  process  p(ln,. .  .,0*1),  which  receive*  messages  on  In.  Some  messages  it  handles  by  itself;  others 
are  delegated  to  the  Out  stream.  If  the  set  of  messages  it  recognises  is  simple,  say  s  and  5,  then 
p  can  be  coded  easily: 

p([a|In],. .  ..Out)  ~ 

. . .,  p(In,. .  ..Out). 
p([b|In],. .  „0ut)  — 

. . p(In,. .  ..Out). 
p([X|In],..,Out)- 

X  *  a,  X  #  b  |  Out=[X|Out/], 
p(In,...,Out/). 

However,  if  the  messages  are  complex,  and  have  arguments  which  should  have  specific  combinations 
of  values,  then  the  explicit  specification  of  conditions  under  which  the  message  should  be  delegated 
becomes  harder.  To  that  effect  a  new  guard  primitive,  called  otherwise ,  is  introduced.  The 
operational  semantics  of  otherwise  is  given  assuming  an  ordering  on  clauses  (say  textual  order). 
Given  a  goal  atom  G ,  an  otherwise  guard  in  a  clause  C  succeeds  if  try(G,Cr)  =  fail  for  every 
clause  Cf  preceding  C. 

Using  otherwise,  defaults  can  be  handled  easily: 

p((X|In],. .  .,0ut)  — 

otherwise  |  Out=[X|Out'], 
p(In . Out'). 

Otkerwite  destroys  clause-wise  modularity,  and  the  explicit  formulations  of  the  conditions  under 
which  it  succeeds  is  often  cumbersome1 1  This  is  the  source  of  its  power,  but  also  an  indication  that 
it  should  not  be  used  excessively.  Otkerwite  is  best  thought  of  as  a  primitive  exception  handling 

However,  K.  Kahn  (personal  communication)  notaa  that  there  ia  a  tcawe  in  which  otkerwite  cnablea  clause 
modularity.  If  a  procedure  need*  to  qpedfy  a  default  cue,  as  In  thie  example,  which  applies  when  aD  other 
cleueee  don't  apply,  than  without  otherwise  it  mart  encode  explicitly  the  negation  of  the  other  guardi,  and 
ehoald  be  updated  If  the  other  rieueee  change.  However,  by  encoding  the  default  with  otherwise  there  is  no 
textual  dependency  between  the  default  clause  and  the  other  cleueee  of  a  procedure. 
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mechanism,  which  should  be  used  only  to  handle  exceptions,  and  not  in  normal  programming 
practice. 

7.7  Enhanced  metvinterpreters 

A  meta-interpreter  for  a  language  £  is  an  interpreter  for  L  written  in  I.  If  &  language  has  simple 
meta-interpreters,  then  one  of  the  most  convenient  ways  to  enhance  a  language,  or  implement 
sublanguages,  is  by  starting  from  a  meta-interpreter  and  enhancing  it  [60,149,171,178,182].  There 
can  be  several  meta-interpreters  for  a  language,  which  differ  in  what  aspects  of  the  execution  model 
they  reify,  i.e.  execute  themselves,  and  what  aspects  they  abtorb,  i.e.  default  to  the  underlying 
languages.  The  most  useful  type  of  meta-interpreter  in  logic  programming  is  the  one  that  reifies 
goal  reduction  and  absorbs  unification. 

Another  distinction  is  how  the  mete- interpreter  is  composed  with  the  program  to  be  inter¬ 
preted.  One  method  is  to  pass  a  date-structure  representing  the  program  as  a  parameter  to  the 
interpreter.  This  approach  is  the  most  flexible,  but  usually  imposes  unacceptable  runtime  over¬ 
head.  On  the  other  extreme,  the  meta-interpreter  and  the  program  to  be  interpreted  can  be  bound 
together  at  compile  time.  This  may  give  the  most  efficient  result,  especially  if  source  to  source 
transformation  techniques,  such  as  partial  evaluation,  are  applied  to  the  combined  program  (see 
below).  This  approach,  however,  is  very  inflexible. 

The  most  common  approach  in  logic  programming,  which  is  also  taken  here,  is  an  intermediate 
one  in  terms  of  efficiency  and  flexibility.  The  program  to  be  interpreted  is  compiled  in  a  special 
way,  and  an  interface  to  the  meta-interpreter  is  provided.  The  interface  determines  which  aspects 
of  the  computation  are  absorbed,  and  hence  compiled  efficiently,  and  which  are  to  be  reified  by  the 
meta-interpreter. 

A  plain  FCP(|)  meta-interpreter 

We  demonstrate  the  approach  for  FCP(|).  Each  clause 

A  -  G  |  B 

of  the  FCP(|)  program  to  be  interpreted  is  transformed  into  the  unit  clause 
elau>c(A,X)  -  G  \  X  =  B'. 

where  B'  is  the  conjunction  obtained  by  replacing  every  goal  atom  G  in  B  whose  predicate  is 
neither  frse  nor  “=’  by  the  term  goal(G). 

For  example,  the  omerye  program  is  represented  by  clauses  like  the  following: 

dause(omerge([X|Inl],[Y|In2],Out),  B)  «—  X  <  Y  | 

B=(0ut=[X|0ut']  ,goal(omerge(Inl  ,[Y[In2]  .Out'))) . 
clause(omerge([  ],ln2,Out),  B)  «—  B=(In2=Out). 

Given  such  a  representation,  an  FCP(|)  meta-interpreter  can  be  written  as  follows: 

%  reince(Goal)  «—  Goal  is  reducible  using  the  program  defined  by  the  c/ssse  predicate. 

reduce(true).  961 

reduce(X=Y)  X=Y.  962 

reduce((A,B))  <—  reduce(A),  reduce(B).  963 

reduce(goal(A))  «—  clause(A,B),  reduce(B).  964 

The  raeta- interpreter  reifies  process  termination  (clause  1)  spawning  (clause  3)  and  reduction 
(clause  4).  Note  that  the  meta-interpreter  interpreters  the  parallel  processes  (A, 5)  in  parallel, 
by  forking  into  the  two  processes  rei»ce(A)  and  reds ee(B).  It  absorbs  unification  (clause  2)  by 
calling  FCP(|)’s  primitive  unification  predicate  when  interpreting  a  unification  goal.  It  also  absorbs 
goal/clause  matching  and  guard  evaluation,  since  these  are  carried  by  the  elave/t  predicate. 


A  termination  detecting  meta-interpreter 

The  mete- interpreter  described  is  not  to  interesting  oo  ite  own  right.  However,  it  may  be  enhanced 
in  eeveral  ways,  to  provide  useful  functionalities.  One  example  is  the  following  meta-interpreter, 
employing  the  short-circuit  technique  to  detect  the  termination  of  the  interpreted  program.  On  the 
call  red*c*(  A,Doae),  Dome  is  unified  with  done  when  the  computation  of  A  successfully  terminates. 

%  reduce(Gotl,Doiu)  —  Goal  is  reducible  and  Donc=tr*c. 

reduce(A,Done)  *— 

reduce' ( A,  done-Done). 

reduce'  (true, L-R) «-  L=R. 

reduce'(X=Y,L-R)  -  (X,L)=(Y,R). 

reduce' ( (A ,B) , L-R)  4-  reduce'(A,L-M),  reduce'(B,M-R). 

reduce' (goal (A), L-R)  «-  clauae(A,B),  reduce' (B.L-R). 

One  of  the  main  weaknesses  of  FCP(|)  is  that,  although  it  can  reflect  on  termination,  it  cannot 
reflect  on  failure,  without  reifying  unification.  In  other  words,  it  is  not  possible  in  FCP(|)  to  enclose 
a  computation  within  a  meta-interpreter  in  the  style  shown  above,  which  reports  failure  when  the 
computation  it  interprets  fails,  without  failing  itself. 

This  problem  is  alleviated  in  more  powerful  languages  such  as  FCP(:),  as  discussed  in  Section 
14. 

An  alternative  solution  is  to  replace  FCP(|)’s  unification  primitive  with  a  three- argument 
predicate,  which  returns  an  indication  whether  unification  succeeded  or  failed.  This  approach  is 
taken  by  Fleng  [133],  and  is  discussed  in  Section  21. 


Interrupt  handling 

Processes  in  FCP(|)  are  anonymous.  Their  number  and  rate  of  creation  and  terminetioQ  renders 
any  conventional  operating  system  approach  to  procem  management  infeasible.  Therefore  the 
implementation  of  standard  operating  system  capabilities,  such  as  the  ability  to  suspend,  resume, 
and  abort  processes  requires  novel  solutions.  The  natural  unit  of  control  in  concurrent  logic 
programming  is  not  a  process,  but  a  (reactive)  computation.12 

In  the  Logix  system  [170]  several,  possibly  interacting,  computations  can  proceed  concur¬ 
rently.  We  show  below  a  meta-interpreter  that  can  control  an  interpreted  reactive  computation  by 
responding  to  control  signals. 


%  redo ce(Gssi,/s)  «— 

Is  is  a  stream  of  ssspead,  resame  and  atari  mrssages.  Goal  is  reducible  or  It  contains  atari. 


reduce(true,Is). 

reduce(X=Y,Is)  <-  X=Y. 

reduce((A,B),Is)  «—  reduce(A,Is),  reduce(B,Is). 

reduce(goeJ(A),Is)  •—  dsuse(A,B,Is),  reduce(B,Is). 

reduce(A,(I|Is])  «-  serve jnterrupt([IJls],A). 

serveanterrupt([sbort|Is],A). 

serveJnterrupt([Nspend|Is],A)  «—  serveJnterrupt(Is,A). 
servejntemipt([resume|Is],A)  *-  reduce(A,Is). 

The  plain  meta-interpreter  is  enhanced  with  an  interrupt  stream  It.  Whenever  an  interrupt  is 
sensed,  an  interrupt-handling  routine  is  called.  The  interrupt  handler  can  serve  the  messages 
suspend,  resume,  and  atari.  To  ensure  that  an  interrupt  will  eventually  be  served,  even  if  the 
interpreted  computation  is  non-terminating,  the  satnown(Is)  guard  should  be  added  to  all  but 
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The  aotiae  of  oompumlaw  wspiorsd  ban  is  rlnsily  related  to  the  eas  used  ia  the  swaotk  definitions,  but  is 
different  ham  it  in  being  reactive 
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the  last  clause  of  reduce.  To  ensure  that  even  a  suspended  process  responds  to  an  interrupt,  an 
additional  clause  is  added  to  the  representation  of  the  interpreted  programs: 

clause(A,B,(I|h])  —  A=B. 

Its  purpose  ia  to  return  the  interpreted  process  intact  when  an  interrupt  is  sensed.  If  an  interrupt 
is  sensed,  the  cla use  process  terminates  and  returns  in  the  body  argument  the  goal  atom  it  was 
called  with.  This  ensures  that  suspended  goal  atoms  of  the  interpreted  computation  are  halted 
rather  than  being  left  suspended.  Once  the  computation  is  resumed,  the  process  is  retried.  This 
feature  ia  used  for  another  purpose  by  the  following  snapshot  meta-interpreters. 

Repeated  live  snapshots 

The  problem  of  obtaining  a  snapshot  of  the  state  of  a  distributed  computation  has  been  investigated 
is  various  models  [14, IS].  The  meta-interpreter  shown  above  can  be  enhanced  to  obtain  repeated 
snapshots  of  the  interpreted  computation,  by  treating  the  short-circuit  as  a  (possibly  empty)  stream 
of  snapshot  requests.  To  obtain  a  snapshot,  a  message  stated  ])  is  sent  down  the  left  end  of  the 
short-circuit.  A  process  P  that  senses  a  message  stale(S)  on  ita  left-end  of  the  switch  sends  the 
message  atste([i>|5])  on  the  right  end  of  the  switch.  This  is  achieved  by  augmenting  the  termination 
detection  meta-interpreter  shown  above  with  the  clause  [149] : 

reduce(A,[state(S)H.]-R)  *—  R=[state([A|S])(R'], 
reduce(A,L,R'). 

When  the  message  slaie(S)  arrives  at  the  right  end  of  the  circuit,  it  contains  a  list  of  processes. 

There  are  several  delicate  points  to  note.  Pint,  as  specified,  the  message  is  guaranteed  to 
arrive  eventually  only  if  the  interpreted  computation  terminates  or  deadlocks.  To  improve  upon 
this  the  guard  nnknoumfb)  can  be  added  to  the  other  clauses  of  the  meta-interpreter.  This  ensures 
that  if  the  number  of  processes  created  in  the  computation  ia  bounded  (i.e.  the  number  of  times  a 
clause  with  more  than  one  atom  in  the  body  is  used  is  finite),  then  the  message  would  eventually 
arrive,  even  if  the  computation  is  nonterminating.  To  obtain  a  snapshot  in  a  computation  with 
unbounded  process  creation,  the  frosen  snapshot  technique,  discussed  below,  must  be  used. 

Second,  the  distributed  fashion  in  which  the  live  snapshot  was  obtained  implies  that  the  list 
of  processes  obtained  is  not  necessarily  a  possible  state  that  actually  occured  in  a  computation 
[158].  For  example,  process  A  could  have  been  added  to  the  snapshot,  then  reduced,  performed  a 
unification  that  enabled  some  other  reduction,  which  created  a  process  B,  which  was  then  added  to 
the  snapshot.  So  the  live  snapshot  may  contain  two  processes  which  are  causally  related,  and  there¬ 
fore  could  never  exist  simultaneously.  Furthermore,  processes  in  the  snapshot  could  appear  more 
instantiated  than  they  were  when  added  to  it,  due  to  other  processes  reducing  before  the  snapshot 
was  completed.  Nevertheless,  under  certain  circumstances13,  a  live-snapshot  is  restartable,  in  the 
following  sense.  If  G  has  a  successful  computation,  and  G'  is  a  live  snapshot  of  this  computation, 
then  G‘  also  has  a  successful  computation  (but  may  also  have  failing  and  deadlocked  ones).  In 
spite  of  these  limitations  live  snapshots  are  useful  for  various  purposes,  including  the  detection  of 
stable  properties  of  networks.  This  subject  is  further  discussed  in  [158]. 

Combining  the  concepts:  interrupt  handling,  termination  detection,  and  the  computation  of  live 
and  frosen  snapshots 

We  show  a  meta-interpreter  which  combines  the  various  features  discussed.  It  has  both  an  interrupt 
stream  and  a  short  circuit,  and  it  uses  the  clause  form  of  the  interrupt-handling  meta-interpreter. 

reduce(true,Is,L-R)  «—  L=R. 
reduce(X=Y,Ia,L-R)  —  (X,L)=(Y,R). 
teduce((A,B),Ia,L-R)  «-  reduce(A,Is,L-M),  reduce(B,I#,M-R). 
reduce(goal(A),l»,L-R)  «— 


;  Specifically,  in  the  case  of  FCP(j),  that  neither  car  nor  unknown  are  used  ia  the  interpreted  program. 


dause(A,B,b),  reduce(B,b,L-R). 
reduce(A,(I|b],L-R)  ♦- 

eerveJnterrupt([I|b]JA|L-R). 

serveJnterrupt([halt[b],A,L~R)  <—  L=R. 
eerveJnterrupt([suspend|b],A,I<-R)  «- 
L=[Done|[/],  R=(Dooe|R'], 
eerve-interrupt(b,A,l/-R'). 
serveJnterrupt([resume|b],A,L~R)  •— 
reduce(A,b,L-R). 

serve-interrupt([snap6hot|b],A,L-R)  «- 
L=[staie(S)|L'],  R=[state([A|S])|R'], 
serveJntemipt(b,A,L'-R'). 

The  meta  interpreter,  celled  with  the  goal  reJuce(G,Is,L~R)1  can  be  and  to  obtain  a  live  snapshot 
5,  even  in  the  presence  of  unbounded  process  creation,  by  providing  it  with  the  following  input: 

Do  in  parallel: 

b=[anapahot,reaume|b/],  L=[state([  ])JI/],  R=[state(S)|R']. 

which  cause  each  process  to  suspend,  add  its  state  to  the  snapshot,  and  resume  immediately. 

A  frosen  snapshot  b  obtained  by  suspending  the  computation,  and  only  then  collecting  the 
state  of  its  processes.  The  following  sequence  of  unifications  can  be  used  to  get  a  frosen  snapshot 
and  then  resume  a  computation. 

Suspend  He  computation: 

b=[suspend|b'],  L=[dooe|L'], 
wait  till  R=(done|R'j, 

Tate  a  snapshot: 

b'=[snapshot|b"],  L'=[state([  ])|t"), 
wait  till  R'=[etate(S)|R"l 
Resume: 

b"=(resume|b"']. 

Specialisation  of  meta-interpreters 

We  have  shown  that  an  enhanced  meta-  interpreter  is  a  very  convenient  tool  for  specifying  functions 
of  computation  control.  However,  a  naive  implementation  of  these  functions  via  enhanced  meta- 
interpreters  could  be  quite  costly.  It  is  quite  common  that  a  program  interpreted  under  an  enhanced 
meta-interpreter  runs  an  order  of  magnitude  slower  compared  with  its  direct  execution. 

One  approach  to  the  problem  employs  the  concept  of  partial  evaluation  [58,43],  first  explored 
in  thh  context  by  Gallagher  [60,01],  and  refined  by  others  [111,160,149,178,101].  It  is  to  specialise 
at  compile-time  the  meta-interpreter  for  the  execution  of  a  given  program. 

For  example,  consider  the  following  (inefficient)  FCP(|)  program  for  reversing  a  list: 

rev([X|Xs],Ys)  «—  rev(Xs.Zs),  append(Zs,[X],Ys). 
rev([],Ys)  -  Ys=[], 

append([X|Xs),Ys,Zs)  *-  Zs=[X|Zs'],  append(Xs,Ys,Zs'). 
append([  ],Ys,Zs)  -  Ys=Zs. 

The  plain  meta-interpreter,  specialised  to  execute  this  program,  it  the  program  itself  (although 
append  can  be  specialised  further,  see  [149]).  In  [149,150]  a  partial  evaluator  for  Flat  Concurrent 
Prolog,  capable  of  partially  evaluating  meta- interpreters,  was  developed.  As  there  is  no  par¬ 
tial  evaluator  for  FCPfl),  we  show  here  examples  of  manual  specialisations  of  meta-interpreters. 
Using  partial  evaluation  techniques  similar  to  those  of  [149,150],  the  termination-detection  meta- 
interpreter  can  be  specialised  to  execute  this  list  reversal  program,  resulting  ia  the  program  [149]: 


-  V  - 


rev([X|Xs],Ys,L-R)  rev(Xs,Zs,L-M),  append(Zs,[X],Ye,M-R). 
rev([],Ys,L-R)  -  (Y.,L)=([  ],R). 

append([X|Xs],Ys,Za,L-R)  -  (Zs,L)=([X|Ze'],M),  appendCXs.Ys.Zs'.M-R). 
append([  ],Ys,Zs,L-R)  —  (Ys,L)=(Zs,R). 

And  the  interrupt-handling  meta-interpreter  can  be  specialised  to  execute  thia  program,  resulting 
in: 

rev([X|Xa],Y»,Ia)  •-  rev(Xs,Za,la),  append(Za,pC] ,Ys,I»). 
rev([  ],Ys,Is)  «—  Ya=[  ]. 

rev(Xa,Ys,(I|Is])  *—  serveJnterrupt([I|Ia],rev(Xs,Ys)). 

append([X|Xs],Ys,Zs,Is)  -  Zs=[X|Zs'],  append(Xa,Ya,Z«',ls). 
append([  ],Ys,Zs,Is)  -  Ys=Zs. 

append(xis,Ye,Zs,[I|Is])  «—  servejnterrupt([I|Is],  append(Xs,Ys,Ze)). 
serveanterrupt([abort|Is],A). 

aervejnterrupt([suspend|Is],A)  «—  serve-in  terrupt(Is,  A). 
serveintemipt([r«sume|ls},Tev(Xi,YB))  •—  Tev(Xs,Ye,ls). 
serve.interrupt([resumejla],append(Xa,Ys,Ze))  •—  append(Xs,Ys,Zs,Ia). 

Note  how  the  state  of  the  interrupted  process  is  passed  to  the  seree.inferrepf  routine,  and  that 
this  routine  has  two  clauses,  one  for  resuming  ret  and  one  for  resuming  append. 

Such  specialisations  eliminate  the  overhead  of  interpretation,  while  preserving  the  function¬ 
ality  of  the  enhanced  meta-interpreter.  The  transformed  programs  are  usually  only  10%  to  50% 
slower  than  the  original  programs,  depending  on  the  added  functionality,  compared  to  the  order 
of  magnitude  slowdown  of  naive  execution  of  the  interpreter  [84]. 

Techniques  for  proving  the  correctness  of  transformations  of  concurrent  logic  programs  are  not, 
as  yet,  well  established.  One  question  under  debate  is  whether  a  transformation  should  preserve 
the  meaning  of  a  program,  including  all  possible  nondeterministic  choices,  an  approach  taken  by 
[57,204],  or  whether  a  transformation  could  fix  some  choices  at  “compile  time”  thus  change  the 
meaning  of  a  program;  this  approach  views  the  source  program  as  a  specification,  which  may  have 
several,  nonequivalent  but  correct,  implementations. 

PART  HI.  CONCURRENT  LOGIC  PROGRAMMING  LANGUAGES 


8.  Language  Comparison 

In  a  trivial  sense  all  reasonable  programming  languages  are  equivalent,  since  they  are  Turing- 
complete  (i.e.  can  simulate  a  Turing  machine,  which  is  a  universal  computational  model).  However, 
if  the  differences  between  languages  were  not  material,  we  would  not  have  invented  so  many  of 
them. 

Concurrent  logic  languages  are  similar  enough  to  allow  a  more  precise  comparison  than  is 
usual  among  programming  languages.  They  all  share  the  same  abstract  computational  model, 
share  the  same  principles,  and  employ  very  similar  syntax.  Therefore  it  it  easier  to  focus  on  their 
differences.  In  comparing  languages  in  this  family,  we  consider  mostly  expressiveness,  simplicity, 
readability,  and  efficiency. 

In  comparing  languages  for  expressiveness,  we  use  two  methods:  the  first  is  to  embed  one 
language  in  another;  the  second  is  to  show  programming  techniques  available  m  one  but  not  in 
another.  We  conclude  that  one  language  is  more  expressive,  or  “stronger”  than  another  if  the 
latter  can  be  “naturally”  embedded  in  the  former,  but  not  vice  versa,  and/or  if  all  programming 
techniques  of  the  latter  are  available  in  the  former,  but  not  vice  versa 


We  first  define  the  notion  of  language  embedding,  which  can  be  need  to  compare  any  two 
languages,  and  then  discum  the  finer  notion  of  natural  embedding,  which  is  tailored  for  the  com¬ 
parison  of  logic  programming  languages.  Related  notions  of  language  and  their  application  to  the 
comparison  of  concurrent  logic  lang  tges  were  studied  by  Saraswat  [156]  and  Levy  [114]. 

Definition:  Language  embedding 

Let  L\  and  £3  be  two  languages,  e  a  function  from  L\  programs  to  £3  programs,  and  s  a  function 
from  observables  of  £3  to  observables  of  £j.  We  say  that  (e,s)  is  an  emheddtey  0/  £j  rit  £3  if 
*€  C(P)  D  lo)  =  ffFfl  £1  for  every  £i-program  P.  In  such  a  case  c  is  called  the  compiler  of  the 
embedding  and  e  its  newer. 

We  say  that  L\  can  be  embedded  in  £3  if  there  are  effective  functions  e  and  s  such  that  (e,e) 
is  an  embedding  of  £j  in  £3.  ■ 

In  other  words,  a  compiler  c  and  a  viewer  e  form  an  embedding  of  £1  in  £3  if  the  observable 
behavior  of  every  £1  program  P  is  the  same  as  the  observable  behavior  of  the  £3  program  obtained 
by  compiling  P  using  e  and  viewing  its  behavior  through  s. 

The  notion  of  embedding  is  rather  weak.  Because  of  the  Turing-completeaeas  of  the  languages 
under  consideration,  any  language  £1  can  be  embedded  in  any  other  language  £3,  by  writing  in 
£2  an  interpreter  of  £j,  and  ‘compiling*  an  £1  program  P  to  the  £3  program  consisting  of  the 
interpreter  augmented  with  a  representation  of  P. 

The  real  issues  with  embeddings  from  £1  to  £3  are  what  is  the  complexity  of  the  compilation 
(e.g.  how  complex  is  the  £1  interpreter  written  in  £3),  what  is  the  runtime  overhead  of  compiled 
programs,  how  much  of  the  parallelism  of  £j  is  preserved  in  the  compilation,  etc.  This  is  usually 
related  to  how  much  of  the  execution  mechanism  of  £t  needs  to  be  reified  in  the  translation,  and 
how  much  of  it  can  be  eiseried  in  the  execution  mechanism  of  £3. 

The  basic  execution  mechanism  in  logic  programming  is  unification.  Therefore  we  are  inter¬ 
ested  in  embeddings  from  £j  to  £3  which  absorb  unification,  i.e.  use  the  unification  mechanism 
of  £3  to  implement  the  unification  mechanism  of  £1.  In  such  an  embedding,  logical  variables  of 
£3  represent  logical  variables  of  £1  snd  hence  nor. ground  goals  of  £3  can  be  used  to  represent 
similar  goals  of  £1.  We  call  an  embedding  that  maps  logical  variables  in  one  language  to  logical 
variables  in  another  ssfsrs/.  We  formalise  this  notion  by  requiring  that  the  viewer  be  the  identity 
function  on  observables  containing  goals  with  predicates  of  the  source  program  (although  it  may 
hide  auxiliary  predicates  introduced  in  the  target  program,  if  any).  This  precludes  embeddings  in 
which  the  compiler  encodes  variables  in  the  source  language  by  constants  of  the  target  language 
and  the  viewer  decodes  the  answer  substitution  given  in  terms  of  these  constants14. 

Definition:  An  embedding  (c,v)  between  two  concurrent  logic  programming  languages  is  uafsrsf 
if  v  is  defined  by: 

«€«<^)I)  =  €|[e(/»)D  I  | 

The  observables  of  a  concurrent  logic  program  P,I[P] ,  were  defined  in  Section  4.2.  In  the 
following  discussions  of  embeddings  we  earn  me  that  the  viewer  is  defined  as  above  and  hence  discum 
only  the  compiler. 

In  the  following  we  show  natural  embeddings  among  concurrent  logic  programming  languages, 
and  argue  (although  not  prove)  the  lack  of  opposite  natural  embeddings.  Our  findings  are  sum¬ 
marised  in  Figure  5.  An  arrow  in  the  figure  indicates  the  existence  of  a  natural  embedding. 

Most  of  the  embeddidgs  we  show  have  additional  pleasant  properties.  For  example,  being 
defined  clause  wise,  and  preserving  not  only  the  observables  but  also  the  behavior  in  context  (the 
so-called  compositional  semantics).  We  do  not  address  these  aspects  further  here. 

We  aou,  howem.  that  U  lb*  s—wsl  logic  Isagaags  has  nfidMlr  psoaM  ostm-logleal  —streets,  which 
enable  It  to  saonlns  sad  — more  loglcsl  vsetoblae.  then  It— c— trees  Us  lefesl «— ishls-vsle^dlctln— im. 
IMag  this  dktknary  swredhig  and  dacsdhg  —  hs  ion  bet-ay  by  the  tsrgst  program.  The  Isage^sr 
die  cessed  la  this  rarrey  do  not  hove  tUr  cepsbOty. 
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Figure  5:  Natural  embeddings  among  concurrent  logic  programming  languages 

A  second  dimension  of  comparison  is  simplicity  of  the  syntax  and  semantics.  A  simpler 
language  is  preferred  since  it  is  easier  both  to  grasp  and  to  be  used  by  humans,  and  is  more 
amenable  to  automatic  program  analysis  and  transformation.  Usually  a  weaker  language  is  also 
simpler,  but  this  is  not  always  so,  especially  when  the  difference  lies  in  the  granularity  of  the  atomic 
operations.  Usually  a  language  with  coarser  granularity,  i.e.  with  larger  atomic  operations,  is  also 
stronger.  Sometimes,  in  addition,  its  transition  system  is  also  simpler  to  define.  For  example,  the 
languages  FCP(|),  FGHC»„  and  FGHCn,v  discussed  below,  have,  progressively,  finer  granularity 
and  more  complicated  transition  systems15. 

A  third  dimension  of  comparison  is  readability.  AU  languages  described  in  this  survey  use 
guarded  Horn  clauses,  first  employed  in  the  Relational  Language  [20].  Most  of  them  follow  the 
syntactic  conventions  of  GHC  [197],  that  matching  is  used  in  the  head,  and  unification  is  specified 
explicitly  in  the  body.  Exceptions  are  FCP(?),  P-Prolog,  ALPS,  and  Doc;  the  impact  of  their 
different  syntactic  conventions  on  readability  is  discussed  when  the  languages  are  introduced. 

A  fourth  dimension  of  comparison  is  ease  of  implementation.  In  general,  the  weaker  the  lan¬ 
guage  the  easier  it  is  to  implement.  In  particular,  the  finer  the  granularity  of  the  language's  atomic 
operations,  the  simpler  the  synchronisation  mechanisms  required  by  its  parallel  implementation. 

15  The  phenomenon  is  true  for  the  ioterioaving semantics  used  in  this  paper,  as  well  as  for  the  approach  to  defining 
semantics  for  languages  with  non-alomic  variables  proposed  by  Maher  [122]  and  extended  by  Saraswat  [155]  It 
is  conceivable  that  using  another  method  for  defining  semantics  may  result  in  a  different  measure  of  simplicity. 
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We  defer  the  comparative  diacuaeion  of  implementation  to  Section  20. 

Each  of  the  dimenaiona  mentioned  —  expressiveness,  aimplicity,  readability,  and  efficiency  —  ia 
only  one  dimenaion  in  a  multidimenaional  design  apace,  which  usually  involves  design  tradeoffs.  For 
example,  a  more  expressive  language  may  have  a  more  complicated  semantics,  and  be  more  difficult 
to  implement.  A  weaker  language  may  need  extra-lingual  facilities  to  compensate  for  its  lack  of 
expressiveness.  Presently  there  is  no  consensus  which  language  in  this  design  space  ia  optimal 
as  a  general  purpose  programming  language  for  parallel  and  distributed  computers,  and  several 
languages  are  being  pursued  actively  as  candidates  for  this  role.  Notable  efforts  which  comprise 
of  both  language  design,  system  design,  and  sequential  and  parallel  implementations  include  KL1 
(Flat  GHC  +  control  meta-call)  [55]  and  PIMOS  at  ICOT  [18],  PARLOG  and  ita  fiat  variants 
[162,66,49],  and  a  PARLOG  system  [48]  at  the  Imperial  College  of  Science  and  Technology,  and 
Flat  Concurrent  Prolog  [162]  and  ita  variants  [99],  and  the  Logix  system  [84,170]  at  the  Weizmann 
Institute  of  Science. 

For  completeness,  we  provide  a  historical  chart  of  concurrent  logic  languages  in  Figure  6.  It 
is  an  extension  of  an  earlier  chart  by  Ringwood  [145].  In  the  chart  the  vertical  axis  denotes  the 
time  in  which  the  language  design  waa  published,  and  an  arrow  indicate  some  kind  of  intellectual 
influence. 


9.  Semantics  of  Concurrent  Logic  Programming  Languages 

In  the  following  sections  we  investigate  several  concurrent  logic  languages.  All  the  flat  lan¬ 
guages  a re  defined  similarly  to  FCP(|),  and  assume  the  same  set  of  guard  test  predicate.  Although 
small,  this  set  turns  out  in  practice  to  be  sufficient  for  most  practical  purposes18.  Their  state  of 
computation,  as  well  as  transitions,  are  identical  to  the  ones  of  FCP(|)  defined  in  Section  4.2.  The 
differences  between  most  of  the  flat  languages  are  captured  simply  by  varying  the  definition  of 
the  clause  try  functions.  Although  different,  all  try  functions  employed  satisfy  the  following  two 
properties: 

s)  Suspension  is  not  stable: 

If  trp(Goal, Clause)  =  suspend 

then  there  is  a  substitution  9  such  that: 
irp(Goal9, Clause)  /  suspend. 

b)  Failure  is  stable: 

If  trp(Goal, Clause)  =  fail 

then  for  every  substitution  9 
trp(Goal9,  Clause)  =  fail. 

Property  s  implies  that  a  suspended  clause  try  may  succeed  or  fail  in  the  future,  if  the  goal  atom 
is  further  instantiated  (e.g.  by  reducing  other  atoms  in  the  goal).  Property  b  implies  that  a  failed 
clause  try  need  not  be  tried  again. 

We  say  that  a  language  is  success  stable  [157]  if  it  satisfies  the  following  property  c: 

c)  Success  is  stable:  \ 

If  trp(Goal, Clause)  =  9 of,  for  some  9  and  9> 
then  trp(Goal9,  Clause)  £  { suspend, fail) ■ 

Most  languages  discussed  in  this  survey,  including  FCP(|),  are  success  stable  if  the  guard  primitives 
unknown  and  ear  are  excluded.  Exceptions  will  be  noted  when  introduced. 

18  Sm  [170]  for  a  deacriptioo  of  tho  (uard  predicts**  and  other  primitive*  in  a  practical  *jr*tam. 
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The  noo-flat  languages  ere  deeeribed  only  informally.  Transition  system  for  non-fiat  languages 
were  defined  by  Saraewat  [164,157]  and  Levy  [114]. 

The  notkmof  language  embedding  aa  described  in  the  previous  section  presupposed  a  definition 
of  the  observables  of  the  source  and  target  language.  As  discussed  in  Section  3,  since  concurrent 
logic  languages  employ  don’t-care  non  determinism,  their  observables  record  the  results  of  failing 
and  deadlocked  computations,  in  addition  to  the  results  of  successful  ones.  The  observables  of 
a  concurrent  logic  program,  in  any  of  the  languages  surveyed,  record  the  initial  state  and  an 
abstraction  of  the  final  state  of  every  computation. 

Compositional  semantics  for  concurrent  logic  programs  that  are  fully  abstract  with  respect 
to  these  observables  were  investigated  by  [59,66].  Other  investigations  of  the  semantics  of  concur¬ 
rent  logic  languages  include  [9,44,5930,110,131,304].  However,  since  the  work  on  the  semantics  of 
concurrent  logic  languages  is  in  a  state  of  flux  we  do  not  survey  it  here. 


10.  Flat  GHC:  A  Language  With  Non-Atomic  Unification 

Flat  GHC  is  the  flat  subset  of  the  language  Guarded  Horn  Clauses  [198,199]  (see  Section  18).  Flat 
GHC,  augmented  with  a  control  meta-call  primitive  discussed  in  Section  10.3  below,  is  the  basis 
of  Kernel  Language  1  [55],  the  core  language  of  the  parallel  computer  system  developed  at  1COT 
aa  part  of  the  Fifth  Generation  Project  [195]. 

We  consider  two  variants  of  Flat  GHC.  One  called  FGHC,.,  for  Flat  GHC  with  atomic  vari¬ 
ables,  and  the  other  called  FGHCU„,  for  Flat  GHC  with  non-atomic  variables. 

FGHC..  is  derived  from  the  original  definition  of  GHC  [198],  and  it  is  quite  similar  to  FCP(|). 
The  difference  is  that  in  FGHC.,  a  unification  specified  by  the  goal  T\  =  Tj  need  not  be  carried  out 
atomically.  Saying  it  differently,  a  program  in  FGHC,,  cannot  specify  that  a  compound  unification 
is  to  be  carried  out  as  an  atomic  operation.  We  have  found  only  one  implication  of  this  difference  in 
terms  of  expressiveness:  FGHC.,  requires  slightly  more  elaborate  code  than  FCP(|)  to  implement 
the  short  circuit  technique. 

FGHC™.  has  an  even  finer  notion  of  atomic  actions.  Intuitively,  in  FGHC..,  even  the  instan¬ 
tiation  of  a  variable  to  a  value  need  not  be  done  atomically,  and  several  occurrences  of  the  same 
variable  can  be  instantiated  to  different  (conflicting)  values  simultaneously.  If  such  a  conflict  occurs 
it  is  eventually  detected  and  results  in  failure.  However,  in  FGHCa.,  there  are  intermediate  states 
of  the  computation  in  which  the  same  variable  may  have  different  values,  whereas  in  FGHC.,  (and 
in  FCP(D)  this  cannot  happen. 

This  property  of  FGHC...  is  a  consequence  of  the  principle  of  anti-nhitl%iakh1f  [199, 197], 
also  called  logical  referential  transparency17.  The  principle  states,  informally,  that  an  occurrence 
of  a  variable  X  can  be  replaced  in  any  context  by  a  new  variable  X',  provided  the  equality  X  — 
X'  is  “added''  to  the  context.  The  principle  is  motivated  by  semantic  elegance,  and  it  justifies  a 
wide  range  of  program  transformations  [304].  Operationally,  it  allows  the  “decoupling"  of  different 
occurrences  of  the  same  variable,  and  instantiating  them  to  different  values.  In  such  a  case  the 
inconsistency  between  the  instantiations  is  detected  eventually,  and  failure  results. 

The  main  difference  in  terms  of  expressiveness  between  FGHC,,  and  FGHC^.,  is  that  in 
the  latter  the  short-circuit  technique  cannot  be  used  to  detect  the  successful  termination  of  a 
computation.  The  reason  is  that  the  closing  of  the  short  circuit,  both  in  the  original  version 
described  for  FCP(j),  and  the  variant  described  for  FGHC,,  below,  cannot  guarantee  that  only 
consistent  instantiations  have  been  made.  It  is  still  possible  that  two  occurrences  of  some  variable 
in  the  computation  were  instantiated  to  inconsistent  values,  which  would  result  in  failure  past  the 
closing  of  the  circuit. 

It  seems  that  without  additional  facilities,  such  as  the  control  meta-call  discussed  below, 


17  By  Grown  Riafwood. 
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detection  of  successful  termination  of  a  computation  cannot  be  ipediied  in  FGHCa,,.  Saying  it 
differently,  FGHC,,,,  cannot  reflect  on  successful  termination,  unlike  FCP(|)  and  FGHC.,. 

The  initial  informal  deacription  of  GHC  [198]  taken  the  atomic  variables  approach,  as  well 
as  the  treatment  of  GHC  in  [157].  Subsequent  theoretical  work  on  GHC  embraced  the  anti¬ 
substitutability  principle  [198],  and  thus  imply  non- atomic  variables.  The  practical  work  at  1COT, 
however,  still  adheres  to  atomic  variables:  the  KL1  language  designed  and  implemented  at  ICOT 
is  essentially  FGHC»»  augmented  with  a  control  meta-call  [90]. 

Although  the  impelementatioo  work  on  GHC  and  KL1  adopted  the  notion  of  “flatness”  em¬ 
ployed  in  this  paper,  Flat  GHC  was  defined  formally  only  recently  [204]  under  the  name  “Theo¬ 
retical  Flat  GHC* .  The  notion  of  flatness  used  there  is  a  bit  different  from  ours. 

In  the  following  we  relate  FCP(|),  FGHCm»  and  FGHCaav  with  regard  to  the  short  circuit 
technique,  and  show  simple  embedding)  of  FGHC,,  in  FCP(|),  and  of  FGHCu*  in  FGHC.,.  The 
syntax  and  the  try  function  are  the  same  for  FGHC,,  and  FGHCn».  The  difference  is  captured 
in  an  additional  Anti~suhtitutc  transition  for  FGHC,,,  described  below. 

10.1  The  language  FGHC,, 

Syntax 

Definition:  An  FGHCav  prof  ram  is  a  finite  sequence  of  guarded  clauses  that  include  the  unit 
clause: 

X  =  X 

end  the  clauses: 

f(X i.Xi . Xn)  =/(Yi,Y2,...,Yh)  «-  X ,=n.  X3=Yi,...,Xn=Yn 

for  every  function  symbol  f/n,  n  >  0,  occurring  in  some  clause  whose  head  predicate  is  different 
from  ‘='M.  | 

Semantics 

The  fact  that  unification  need  not  be  done  atomically  is  captured  by  the  equality  clauses,  which 
allow  a  compound  unification  to  be  performed  piecemeal. 

The  FGHC«  try  function  is  defined  as  follows.  Let  C  =  (Aj=Aj  «—...)  be  a  renaming  of  a 
unification  clause  in  P. 

<nhoao(7i=7»,C)  =  mys((  T\ ,  Tj)t(Xi ,A*)). 

The  try  function  for  the  other  daueee  of  P  is  defined  as  for  FCP(|). 

Notts: 

1)  The  eemantica  of  the  unit  clause  X  =  X  in  FGHC.,  is  identical  to  that  of  FCP(|),  since 
mf»((T|,Tj),(Jf,Jf ))  =  my«(Ti,Tj),  if  X  does  not  occur  in  7i  and  7j. 

2)  The  atomic  operation  in  FGHC,,  is  assigning  a  variable  to  a  variable,  or  assigning  a  term 
whose  arguments  are  distinct  variables  to  a  variable.  The  first  action  is  allowed  by  the  unit 
unification  clause;  the  second  by  the  other  clauses.  We  do  not  prevent  larger  atomic  actions; 
ere  simply  do  not  require  them,  by  permitting  smaller  ones. 

Comparison  of  FGHC,,  with  FCP(|) 

The  main  implication  of  the  lack  of  atomic  unification  in  FGHC  in  terms  of  expressiveness  is 
that  FGHC,,  cannot  use  the  short-circuit  technique  as  specified  to  detect  the  termination  of  a 
computation.  In  FCP(|)  one  can  perform  the  unification  of  the  underlying  computation  X  = 


18  We  assume  hwe  the)  an  initial  feel  dose  not  contain  function  symbols  which  do  not  occur  in  the  program. 
Alternatively,  an  equably  dauee  has  to  be  added  for  every  function  symbol  allowed  in  a  goal,  or  a  general 
recursive  definition  of  unification,  uatng  the  Prolog. like  predicate*  functor  and  try  should  be  ueed. 
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Y  and  don  the  abort  circuit  L-S  atomically,  within  the  eame  compound  unification  (X,L)  = 
( Y,R).  In  FGHC,,  one  need*  fint  to  perform  the  unification  X  —  Y,  wait  for  it  to  complete 
using  matching,  and  only  then  close  the  short-circuit19.  This  can  be  achieved  by  the  procedure 
nifi-i*i-diM-tc{X,  YJ.-R),  defined  using  the  auxiliary  procedure  MidLsRi-doicjc  as  follows: 

unity  and,rJoe«jc(X,Y,L-R)  — 

X—Y,  match  anrtxloaejac(X,Y^i-R). 

match  and  rinse  sc(X,X,L-R)  «- 
L=R. 

Note  that  to  detect  the  termination  of  an  underlying  computation,  omiftjnXxtottjc  must  be 
used  instead  of  '=’  throughout  the  underlying  program. 

The  FCP(|)  termination  detecting  meta-interpreter  shown  in  Section  7.7  above  is  also  an 
FGHC  meta-interpreter.  However,  the  unification  performed  in  the  body  of  the  clause: 

teduce(X=Y,L-R)  -  (X,L)=(Y,R). 

behaves  differently  in  FCP(|)  and  FGHC,,.  The  modified  version  of  the  short  circuit  can  be  aaed  in 
an  FGHC,,  termination  detecting  meta-interpreter,  by  replacing  the  above  clause  with  the  clause: 

teduce(X=Y,L-R)  «-  unify -an  <Lcfoee-ec(X,Y,L-R). 

The  difference  between  FCP(j)  and  FGBCu  can  be  observed  by  compomag  a  program  that 
does  the  compound  unification  f(X,  Y)  =  /(a,*)  with  a  program  that  matches  either  X  =  a  or  Y 
=  I,  then  unifies  the  other  variable  with  c.  Such  a  program  is  the  same  in  FCP(|)  and  FGHC: 

test(a,Y)  -  Y=c. 
test(X.b)  X=c. 

If  this  FCP(|)  program  were  to  execute  using  the  goal: 
te^X.Y),  f(a,b)=f(X,Y) 

the  terminal  state  would  never  contain  a  substitution  in  which  X  :=  c  or  Y  :=  e.  As  an  FGHC,, 
program,  some  executions  will  have  X  :=  c  and  tome  Y  :=  c,  since  FGHC,,  cannot  specify  that 
the  two  unifications  X  =  a  and  Y  =  I  be  carried  out  atomically. 

An  attempt  to  establish  the  difference  in  power  between  languages  with  and  without  atomic 
unification  was  made  by  Saraswat  [156]. 

An  embedding  of  FGHC,,  in  FCPfll 

FGHC,,  can  be  naturally  embedded  in  FCP(|)  using  the  following  compiler.  The  compiler  trans¬ 
lates  an  FGHC,,  program  P  to  FCP(|)  clause  arise.  In  every  clause,  it  replaces  every  body  goal 
X—Y  with  the  goal  %utf$(X,  Y),  where  •»ify/2  is  a  predicate  not  occurring  in  P.  It  adds  to  the 
resulting  program  the  clause: 

unify(X,Y)  —  X=Y. 

and  for  every  function  symbol  //a  occurring  in  P,  the  clause: 

unify(f(Xl,X,,. .  .X),*(Yl,Y»,. -  ,Y,))  - 

unify(Xr,Yj),  unify(Xj,Ya),  ....  unify(X„,Y„). 

This  completes  the  description  of  the  compiler. 

Like  in  the  definition  of  the  semantics  of  FGHC,,,  the  effect  of  the  definition  of  onif y/J  is 
that  a  compound  unification  can  be  carried  out  either  atomically,  using  the  first  clause,  or  non- 
atomically,  using  the  other  clauses. 

19  This  method  Is  due  to  E.D.  Tribble 
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Syntax 

The  syntax  of  FGHC»,  is  the  same  as  that  of  FGHC,,. 


Semantics 

The  difference  between  FGHC,,  and  FGHC,,,,  namely  the  anti-snbetitntability  principles,  can 
be  modelled  in  our  framework  by  extending  the  transition  system  of  FGHC,,  with  the  following 
transition: 

•  Anti-substitute: 


Astf-MbMitllt, 


- -  ( G',X=Y-,«) 

where  Y  is  a  variable  that  does  not  occur  in  G,  and  G'  is 


obtained  by  replacing  one  occurrence  of  X  by  Y  in  G. 


This  transition  directly  models  the  principle  of  anti-substitutability.  However,  as  stated,  it  allow, 
almost  any  FGHC^,  program  to  diverge,  by  alternating  the  introduction  and  elimination  of  the 
equality  goals  using  Anti-substitute  and  Reduce*,,,-.  To  prevent  this,  additional  complicated 
fairness  conditions  need  to  be  incorporated. 

Note  that  the  Anti-substitute  transition  may  cause  a  conditional  answer  substitution  to  contain 
inconsistent  assignments.  We  have  defined  substitutions  to  be  functions,  i.e.  have  a  angle  value 
for  a  variable.  This  has  to  be  modified  in  order  to  specify  observables  for  FGHCu,. 

The  difficulty  in  modelling  the  semantics  of  FGHCm,  can  be  attributed  to  the  need  to  ac¬ 
commodate  inconsistent  constraints  on  the  values  of  variables.  The  method  proposed  by  Maher 
[124]  and  further  developed  by  Saraswat  [166,187],  suggest  another  method  for  modelling  this.  The 
method  is  to  separate  the  goal  atoms  into  "pools”,  each  containing  its  own  binding  environment, 
and  add  explicit  transitions  which  communicate  equality  constraints  between  pools.  Failure  occurs 
as  soon  as  one  of  the  pools  detects  inconsistency. 


Comparison  of  FGHC,,  with  FGHC,,,. 

FGHC,,  could  use  the  short-circuit  technique  to  detect  successful  termination  of  a  computation, 
albeit  with  some  additional  effort.  The  technique  is  not  applicable  in  FGHCn,.  It  is  possible  that 
the  unifications  executed  by  the  monitored  computation  are  inconsistent,  without  this  inconsistency 
detected  prior  to  the  dosing  of  the  short  circuit.  Thus,  unlike  in  FCP(|)  or  FGHC,,,  the  closing  of 
the  short-circuit  is  not  a  reliable  indication  that  the  computation  has  not  failed.  Technically,  the 
Anti-substitute  transition  incorporates  in  the  underlying  computation  unifications  which  are  not 
threaded  via  the  short-circuit.  Even  if  the  short-circuit  doses,  the  new  unifications  introduced  by 
the  Anti-substitute  transitions  can  subsequently  fail. 


An  embedding  of  FGHC,,,  in  FGHC,, 

An  embedding  of  ^GHCu,  in  FGHC,,  consists  of  a  dauae-wise  compiler  and  the  identity  viewer. 
Tot  each  dause,  the  compiler  iteratively  performs  anti-substitution  to  any  variable  that  occurs 
more  than  once  in  dause  body  atoms  other  than  equality,  until  no  such  variables  are  left.  By 
doing  so,  the  compiler  “decouples”  every  variable  that  may  be  used  for  communication,  and  adds 
equalities  to  dause  bodies.  These  equelities  will  eventually  unify  all  decoupled  occurrences  of  each 
variable. 


The  ability  to  reflect  on  the  termination  and  failure  of  a  computation  is  essential  to  a  systems 
programming  language,  but  FGHC„  (and  FCP(|))  cannot  do  the  latter,  and  FGHCu,  can  do 
neither,  without  reifying  unification.  The  problem  can  be  solved  in  two  different  ways.  One  is 
to  strengthen  the  basic  mechanisms  of  the  language.  Atomic  variables  are  sufficient  to  reflect  on 
termination.  To  reflect  on  failure,  atomic  test  unification  is  needed,  as  incorporated  in  the  stronger 
variants  of  FCP(|)  shown  in  Sections  14,  18,  16  below. 
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Another  solution,  which  was  taken  by  the  developers  of  both  GHC  and  PARLOG,  is  to  add  to 
the  language  a  meta-level  construct,  which  has  “built-in*  reflection  and  control  capabilities.  There 
are  aeveral  variations  on  the  construct,  originally  proposed  by  Clark  and  Gregory  [22],  One  variant, 
which  is  referred  to  in  the  following  as  the  control  meta-call,  has  the  form  call(Goal,Signalt,Eventa) 
where  Sipnalt  is  a  stream  of  {suspend,  resume,  uteri},  and  Evcnit  ia  a  stream  of  {suspended, 
resumed,  failed(Goal),  halted,  storied},  the  last  three  being  terminal  events. 

The  intuitive  semantics  of  the  control  meta-call  is  as  follows.  A  computation  of  a  goal  G 
is  started  under  the  control  meta-call  using  the  goal  eall(G,In,Ont).  If  some  goal  atom  G'  in 
the  computation  fails,  the  message  failed(G')  appears  on  the  Oat  stream.  If  the  computation 
terminates,  the  message  hatted  appears  on  Out.  To  suspend  the  computation,  the  message  suspend 
is  sent  to  the  In  stream,  and  when  suspension  occurs  the  acknowledgement  message  suspended 
appears  on  Out.  Similarly,  to  resume  or  abort  the  computation  the  message  resume  or  short  is 
sent  on  In,  and  the  corresponding  acknowledgement  message  resumed  or  storied  appears  on  Gut. 
Using  the  control  meta-call,  a  process  ia  the  language  can  start  a  computation  and  monitor  it. 

We  refer  to  the  language  FGHC„  augmented  with  the  control  meta-call  as  KL1  [55],  The 
actual  meta-call  implemented  as  part  of  PIMOS  [18],  the  KL1  operating  system,  also  includes 
resource  management  facilities:  a  computation  ia  allocated  some  CPU  time  and  some  memory, 
and  when  either  of  these  is  consumed  it  announces  resource  overflow  and  suspends.  It  can  be 
resumed  by  providing  it  with  additional  resources. 

The  control  meta-call  rliminatea  much  of  the  freedom  of  non-atomic  variables.  For  exam¬ 
ple,  it  can  be  used  to  detect  the  successful  termination  of  unification,  a  capability  not  present 
in  FGHCw  Hence  its  implementation  restricts  the  kind  of  algorithms  that  can  be  used  in  a 
distributed  implementation  of  the  language;  in  particular,  the  algorithms  must  incorporate  some 
form  of  distributed  termination  detection. 

In  comparison  with  the  meta-interpreters  of  FCPQ)  shown  in  Section  7.7,  the  meta-call  con¬ 
struct  reflects  on  failure,  whereas  an  FCP(|)  meta-interpreter  cannot.  On  the  other  hand,  an 
FCP(|)  meta-interpreter  can  produce  snapshots,  whereas  the  standard  meta-call  constructs  cannot 
(although  Gregory  et  si.  [67]  have  proposed  an  enhanced  control  meta-call  that  does).  We  will 
come  back  to  the  meta-call  when  we  discuss  FCP(:)  in  Section  14. 

Yet  a  third  approach  is  to  construct  a  meta-interpreter  that  reifies  unification,  and  extend  it 
in  various  ways.  A  first  step  in  this  direction  was  taken  by  Tanaka  [182]. 


11.  Flat  PARLOG:  FGHC  Extended  With  Sequential-Or  and  Sequential- And 

The  PARLOG  language  [21],  described  in  Section  18,  preceded  GHC,  but  went  through  several 
evolutions  that  made  it  closer  to  GHC  [24,86,148],  In  the  earlier  definition  [21],  referred  to  as 
PARLOG83  by  [145],  the  output  mechanism  was  assignment,  rather  than  unification.  In  the  latter 
definition  [66],  refered  to  as  PARLOG86  by  [145],  the  output  mechanism  was  (non-atomic)  unifi¬ 
cation,  as  employed  by  GHC.  PARLOG  consists  of  two  sublanguages:  the  single-solution  subset, 
and  the  all-eolutiona  subset.  The  latter  is  essentially  Or-parallel  Prolog.  Here  we  concentrate  on 
the  flat  subset  of  the  former.  The  non-flat  language  is  discussed  in  Section  18.  Presently,  the  main 
difference  between  the  computational  models  of  the  single-solution  subset  of  PARLOG  and  GHC 
is  the  aequential-Or  and  aequentiai-And  constructs  of  PARLOG.  In  addition,  PARLOG  offers  a 
surface  syntax  which  contains  mode  declarations.  For  example,  using  modes,  the  PARLOG  append 
program  could  be  specified  ss  follows: 

mode  append(?,?,t). 

append([X|Xs],Ys,[X|Zs])  <-  append(Xs,Ys,Zs). 
append([  ],Ys,Ys). 

This  program  ia  then  translated  to  PARLOG  standard  form: 
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append(Xs,Ys,Zs)  ♦— 

X*  <=  [X|X*0  I  Z«=[X|Z«T,  appead(Xs',Ys,Za'). 
appeod(Xs,Ys,Zs)  «— 

Xs  c=  []  |  Ya=Za. 

where  <=  ia  PARLOG’s  input  matching  primitive.  Thia  program  ia  operationally  identical  to  the 
Flat  GHC  program: 

append([X|Xa],Ya,Za)  —  Ze=[X|Ze'],  append(Xa,Ya,Za'). 
append([  ],Ys,Za)  •—  Ya=Za. 

Several  propoaala  were  made  for  a  “flat*  an  beet  of  PARLOG  [66,106,49],  The  Flat  PARLOG  of 
Foster  and  Taylor  [49]  ia  eeaentially  Flat  GHC  with  mode  declarations  aa  surface  syntax.  Recently, 
a  language  called  Strand  [188]  was  derived  by  Foster  and  Taylor  from  their  Flat  PARLOG  language 
by  restricting  the  output  mechanism  to  be  assignment,  rather  than  unification.  Strand  is  essentially 
a  flat  version  of  PARLOG 83,  with  sequential- And  and  sequential-Or  eliminated.  PARLOG83  and 
Strand  are  not  success  stable. 

Our  definition  of  Flat  PARLOG  is  baaed  on  the  KPand  Tw*  language  of  Gregory  [66].  The 
language  ia  Flat  GHC  augmented  with  aequentiaLOr  and  sequential-And.  We  investigate  each  of 
the  two  extensions  to  Flat  GHC  separately,  denoting  the  resultant  languages  FP(;)  and  FP(&) 
Fc  the  sake  of  uniformity  we  use  the  Flat  Gh'C  syntax  instead  of  the  PARLOG  standard  form 
syntax.  The  translation  from  the  Flat  PARLOG  syntax  to  the  Flat  GHC  syntax  ia  straightforward. 

In  the  following  we  refer  to  the  language  combining  FP(;),  FP(&),  and  the  control  meta-call 
n-i  Flat  PARLOG.  Although  the  subject  ia  not  discussed  explicitly  in  the  PARLOG  papers,  we 
cv  jflM  that  PARLOG  has  atomic  variables,  and  hence  consider  Flat  PARLOG  to  be  an  extension 
o’  FGHC»,  rather  than  of  FGHC,„. 


I!  1  The  language  FP(;) 

The  language  FP(;)  allows  the  specification  of  sequential-Or  clauses  .  .;Cn,  where  each 

disjunct  Ci  ia  an  ordinary  guarded  clause.  The  idea  of  a  sequential-Or  clause  is  that  the  guarded 
clause  C{  can  be  selected  only  if  the  clause  tries  of  the  clauses  C\,. .  .,Cj_l  fail.  The  connective  is 
tailed  itjuenUtl-Or.  The  similarity  of  sequential-  Or  clauses  to  if-then-else  constructs  in  procedural 
inguagea  and  to  conditionals  in  Lisp  is  apparent. 

Syntax 

Definition:  Sequential-Or  clause,  program. 

•  A  scfsealial-Or  clause  is  a  guarded  clause  or  has  the  form  C\  ;  Cj 
where  Ci  is  a  guarded  clause  and  Cj  is  a  sequential-Or  clause. 
t  An  FP(;)  program  is  a  set  of  sequential-Or  clauses,  augmented  with  unification  clauses  as  in 
FGHC.  | 


Ssmuatiq 

The  frfrn,)  function  is  defined  aa  follows.  For  a  conditional  clause  Ci;Cj 


fnbew(A,Ci; 


frfronc(A,Ci) 

<C|Ve<i)(<4,  C?) 


if  frypoitc(A,Ci)  ji  /sil 
if  lrypaKc(A,Ci)  =  /sil. 


For  a  guarded  clause  C 


frfrpq)(A,C)  —  lryFaH~  (A,C). 


An  embedding  of  FP( ;)  in  FGHC»»  with  otkerwur 

fhe  embedding  consists  of  a  clause-wise  compiler.  Its  method  of  compiling  sequential-Or  into 
rcrunse  (introduced  in  Section  7.6)  is  similar  to  the  one  used  by  Codish  and  Shapiro  [29]  to 
'•mslatc  a  non-flat  language  into  a  flat  one. 
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The  general  idea  ia  to  reify  dauae  aelection,  by  explicitly  programming  the  commitment  oper¬ 
ation.  Each  sequential-Or  dauae  .  .;CW  ia  tranalated  into  a  different  procedure  conaiating 

of  m  guarded  dauaea  by  adding  otherwise  to  the  dauaea  Cj,. .  .,Cm-  Thia  enaurea  that  a  disjunct 
can  eucceed  only  if  all  previoua  disjunct*  fail.  The  head  predicate*  of  dauaea  resulting  from  each 
disjunctive  dauae  are  renamed  to  form  the  new  procedure. 

A  call  to  the  original  procedure  ia  tranalated  into  a  conjunctive  call,  one  goal  for  each  of  the 
new  procedures.  The  single-round  mutual  exclusion  protocol  shown  in  Section  7.4ia  used  to  ensure 
that  at  moat  one  of  these  goals  would  ‘‘commit'' ,  i.e.  proceed  to  execute  the  body  of  the  selected 
disjunct  of  that  aequential-Or  dauae.  The  other  goals  terminate  quietly  without  causing  any  effect. 

More  specifically,  an  FP(;)  procedure  of  the  predicate  p/a  with  k  aequential-Or  dauaea  ia 
tranalated  into  21:  FGHC»W  procedures  as  follows.  The  i'*  conditional  danse  Ci;Cj;.  ;Cm  is 
tranalated  into  the  two  FGHC»V  procedures  test.p(/k+ 1  and  commitji/k+l.  Each  disjunct  Cj  = 

.  .,7fc)  «—  G  |  B)  is  translated  to  the  dauaea: 

teat-p^Ti.Tj,. .  .,T*,  Commit) 
otherwise,  G  | 

Commit=lock(Reply) , 
commit_pj(Reply,Ti,Tj,. .  .,T„). 

commits, (granted, Ti,T2,.  .  .,T„)  «—  B. 
commit-p, -(ref used,- . .,-). 

And  the  call  to  p/n  ia  translated  to  calls  to  the  iesi-Pi  procedures  uBing  the  clause: 

p(X|,Xj,...,X„)^ 

test_p1(Xi,X2,. .  .,X„, Commit]), 
test.p](X]  ,Xj,. .  .Xn.Commitj), 

test.pm(X],Xj,. .  .,Xn,Commitm), 
mutex(Commit],Commita,. .  ,,Commitm). 

where  the  i‘h  clause  of  meter/ n  is: 

mutex(Commit] . lock(Reply),. .  .,Commitm)  <— 

Reply-granted, 

Commit  i  =lock(refused) , 

Commit2=lock(refused), 

. .  .(excluding  Commit,). . . 

Commitu,=lock(refused). 

As  the  translation  shows,  there  ia  a  close  relationship  between  aequential-Or  and  otherwise,  and  it 
can  be  said  that  they  were  both  designed  to  solve  the  same  problem.  Which  construct  to  prefer  is 
largely  a  matter  of  taste.  Both  destroy  clause-wise  modularity  and  are  easily  open  to  abuse,  and 
therefore  should  be  used  sparingly.  Sequential-Or  is  more  appealing  in  being  general  and  uniform. 
Otherwise  is  more  restricted  (it  can  be  viewed  as  a  special  case  of  sequential-Or  [154,156]),  as 
perhaps  appropriate  for  an  exceptional  construct,  and  the  cases  in  which  it  is  less  convenient  than 
sequential-Or  for  its  purpose  are  rare. 

11.2  FP(fe) 

The  language  FP(&)  is  FGHC,y  augmented  with  sequential- And.  Adding  aequential-And  to  a 
language  that  supports  dynamic  creation  of  processes  complicates  both  the  definition  and  imple- 
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mentation  of  tbe  language.  In  defining  the  operational  semantics,  the  state  of  the  computation 
cannot  be  represented  by  a  sequence  of  goals.  A  tree  of  alternating  sequential- And  and  parallel- And 
nodes,  whose  leaves  contain  the  goals,  is  required.  The  definition  of  a  transition  is  also  complicated 
by  the  constraint  that  a  goal  can  be  selected  only  if  it  can  be  reached  from  the  root  by  selecting 
the  left-most  branch  in  every  sequential- And  node. 

Because  of  this  complication,  FP (&)  does  not  fit  the  semantic  framework  we  described.  In¬ 
stead,  we  define  the  syntax  of  FP(fc),  and  provide  it  with  semantics  by  embedding  it  in  FGHCa>, 
using  the  short-circuit  technique. 

The  compiler  of  the  embedding  translates  each  FP(lc)  program  P  into  an  FP(&)  interpreter 
written  in  FGHC»Y1  augmented  with  the  standard  clausal  representation  of  P.  Since  FGHC»V  can 
be  embedded  directly  in  FP(&),  using  the  identity  compiler  and  viewer,  this  shows  that  the  two 
languages  and  practically  identical  from  an  expressiveness  point  of  view. 

Syntax 

Definition:  FP(&)  clause  and  program. 

•  An  FP(&)  clause  is  a  formula  of  the  form 

A  *—  Gi . Gm  j  n,m  >  0 

where  the  A  and  Gj’ a  are  as  before,  and  each  B;  has  the  form: 

Ai  k  ...  k  Ai,  (*  >  0) 
where  each  dj  is  an  atom. 

•  An  FP(&)  program  is  a  finite  sequence  of  FP(&)  clauses.  | 

Ssflinutia 

Let  P  be  an  FP(&)  program.  Translate  each  clause: 

A  -  G  1  B. 

of  P  into  the  FGHC„  clauses: 

clauae(A,B")  —  G  |  B'=B". 

where  each  unit  goal  G  in  B  with  predicate  other  than  *=’  is  replaced  by  goal(G)  in  B'  and 
A  «—  reduce(A). 

where  clause  and  redace  are  predicates  not  occuring  in  P.  Call  the  resulting  program  P'. 

An  interpreter  /  of  FP(&)  in  FGHC»»,  which  assumes  this  representation,  is  defined  as  follows: 

reduce(A)  — 

reduce'(A,done-Done). 

reduce'(true.L-R)  *—  L=R. 
reduce'(X=Y,L-R)  *—  unify -And-cloee_Bc(X,Y,L,R). 
reduce'((A,B),L-R)  -  reduce'(A.L-M),  reduce'(B,M-R). 
reduce' (AAB.L-R)  <—  reduce'(A.done-Done),  wait(Done,B,L-R). 
reduce' (goal(A),L-R)  •—  dauae(A,B),  reduce'(B,L-R). 

wait(done,A,L-R)  —  reduce'(A.L-R). 

unify jnd^loseje(X,Y,L,R)  •—  See  definition  in  Section  10. 

The  interpreter  implements  A  k  B  by  executing  A  and  suspending  tbe  execution  of  B  until  A 
terminates.  Recursively  nested  sequential  and  parallel  And’s,  which  may  be  created  by  recursive 
procedures,  are  handled  correctly,  by  starting  a  new  short  circuit  for  every  sequential  component 
The  compiler  c  is  defined  to  map  P  to  P1  U  /.  The  viewer  c  is  the  identify  function  on  the 
predicates  of  P,  and  hides  the  predicates  clause  and  ncdsce.  Tbe  observables  of  an  FP(&)  are  then 
defined  to  be  »({[e(P)]])- 
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Not  only  the  direct  definition  of  sequential-And  i«  quite  complex,  but  nlao  its  direct  imple¬ 
mentation.  First,  without  complex  data-structures  it  may  take  an  unbounded  amount  of  time  to 
find  the  next  process  to  execute  —  the  amount  ia  determined  by  the  depth  of  nesting  of  sequential 
and  parallel  And’*.  Second,  in  a  parallel  implementation  of  the  language,  executing  correctly  the 
conjunct  A  It  B  requires  performing  distributed  termination  detection  on  A. 

The  interpreter  of  FP(fc)  in  FCP(|)  solve#  the  two  problem#  by  delegating  them  to  the  under¬ 
lying  implementation  of  FGHC»,:  the  process  suspension  and  activation  mechanism  of  FGHC., 
wakes  up  the  wait  process  when  its  fust  argument  ia  instantiated  to  done.  The  short  circuit  tech¬ 
nique  combined  with  the  implementation  of  unification  with  atomic  variables  essentially  realises 
a  well-known  distributed  termination  detection  algorithm  based  on  distributed  counters  J 1 58]  (see 
discussion  in  Section  7.5). 


12.  P- Prolog*  —  Synchronizing  Deterministic  Logic  Programs 


In  the  languages  presented  so  far  synchronization  was  achieved  with  matching,  specified  by  clause 
heads:  a  clause  tty  suspends  if  its  matching  with  the  clause  hesd,  or  checking  the  guard,  suspend. 

An  alternative  approach  to  synchronization  in  concurrent  logic  programming  was  proposed 
by  Yang  and  Aiao  (208,210),  and  incorporated  in  tbe  language  P-Prolog.  Although  P-Prolog 
incorporates  also  an  all-solutions  Or-psrsllel  component,  we  do  not  disease  it  here.  We  focus  on 
its  other  component,  which  employs  a  novel  synchronization  mechanism  called  ezclssise  guarded 
Horn  clause*.  We  refer  to  this  language  subset  as  P-Prolog,. 

P-Prolog,  does  not  use  matching  for  synchronisation.  It  uses  goal/clause  unification,  rather 
than  matching,  and  employs  the  following  synchronization  principle  instead:  the  reduction  of  s 
goal  with  s  clause  is  enabled  when  it  can  be  determined  that  the  reduction  with  all  alternative 
clauses  is  failed.  In  other  words,  s  process  is  suspended  as  long  as  it  has  more  than  one  danse  to 
reduce  with.  It  reduces  if  it  has  exactly  one  clause  to  reduce  with;  it  fails  when  it  has  none  A 
process  never  makes  an  Or- nonde  ter  mi  rustic  choice. 

The  appeal  of  this  synchronization  principle  is  in  the  following  lemma,  a  variant  of  which  ia 
due  to  Maher  [124].  The  lemma  implies  that  the  And-nondeterminism  of  P-Prolog,  does  not  affect 
the  reault  of  computations. 


Lemma:  Equivalence  of  P-Prolog,  computations. 

If  &  P-Prolog  program  P  has  s  successful  computation  from  a  goal  G  then  every  computation  of 
P  from  G  is  successful  and  the  answer  substitutions  of  all  such  computations  are  the  asms  (up  to 
renaming).  | 


Syntax 

The  syntax  of  P-Prolog,  is  the  same  as  that  of  FCP(|). 


Semantics 

We  define  the  P-Prolog,  try  function,  try*,,  using  the  auxiliary  (unction  tryV*.  Note  that  <ryV» 
is  essentially  tryLr  augmented  with  guard  evaluation.  The  program  P  ia  an  additional  parameter 
of  the  functions. 


try'„(A',(A— <3|fl),P) 


d  if  my a(A,A')  =  f  A  checking  G0  succeeds 

fail  if  mga(A:A')  =  ff  A  checking  Gf  fails 

V  mfa(A,A')  =  fail 
suspend  otherwise 


»n^(A,c,p> 


{d  if  tr)’„(A,C,P)  =  d  A  (rp'„(A,C'„P)  =  fail 

for  every  C’  g  P,  C'  jt  C 
fait  if  lry’„(A,C,P )  =  fail 

suspend  otherwise 
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Discussion 

The  advantage  of  P-Prolog*  is  that  the  order  of  execution  of  processes  is  immaterial,  since  if  a  goal 
has  a  successful  computation,  then  all  of  its  computations  are  successful  and  produce  the  same 
answer  substitution. 

The  determinism  of  P-Prolog*  limits  it  to  algorithmic  applications,  since  it  cannot  implement 
system  programs  such  as  a  stream  merger  and  an  interrupt  handler30.  Most  algorithmic  concurrent 
•  logic  programs  can  be  written  in  P- Prolog,  quite  easily,  without  the  need  to  distinguish  between 
matching  and  unification.  This  implies  that  some  P-Prolog*  programs  can  be  used  in  more  than 
one  ‘mode’.  Consider,  for  example,  the  P- Prolog*  append  program: 

append([X|Xs],Ys,[X|Zs])  —  append(Xs,Ys,Zs). 
appendfl  ],Y8,Ys). 

This  program  can  be  used  to  append  two  lists,  as  usual.  However,  it  can  also  be  used  to  compute 
the  difference  between  a  list  and  its  prefix,  using,  e.g.,  the  call: 

append([l,2,3],Ys,[l,2,3,4,5,6]). 

This  is  possible  since  at  most  one  clause  head  unifies  with  the  initial  goal,  as  well  as  with  subsequent 
goals,  and  hence  goal  reduction  can  proceed. 

The  practical  advantage  of  this  'multiple- mode’  ability  is  questionable.  In  practice,  few  logic 
programs  are  used  in  more  than  one  mode.  When  they  do,  the  two  common  modes  are  output 
generation  and  testing,  which  can  be  employed  by  all  other  concurrent  logic  languages  mentioned, 
rather  then  inverting  the  roles  of  input  and  output  within  a  single  clause,  which  is  unique  to  P- 
Prolog,  and  its  superset  ALPS  (and  is  available  in  a  more  restricted  sense  also  in  FCP(?)  and 
FCP(:,?)  introduced  below). 

Furthermore,  P-Prolog*  uses  unification  in  the  head.  As  mentioned  in  the  discussion  of  FCP(?) 
in  Section  15  below,  this  generality  seems  to  impede  program  readability  and  maintainability, since 
often  the  intended  mode  of  use  is  known  and  fixed,  but  is  not  communicated  by  the  code. 

Embedding  P- Prolog*  in  FCPfl) 

The  implementation  of  P-Prolog*  is  not  trivial.  A  naive  implementation  would  be  to  try  all  clauses 
whenever  a  process  reduction  is  attempted;  return  to  the  successful  clause  if  only  one  exists,  or 
suspend  on  all  variables  instantiated  during  clause  tries  if  there  were  more  than  one  successful  clause 
try.  The  overhead  of  this  scheme  seems  unacceptable.  An  efficient  implementation  of  P-Prolog* 
seems  to  require  a  complete  analysis  of  all  possible  call  patterns,  which  is  also  quite  complex. 

To  establish  the  relation  between  P-Prolog*  and  other  languages  in  the  family,  we  show  here 
an  embedding  of  P-Prolog*  in  FCP(|).  The  idea  of  the  embedding  is  as  follows.  For  each  goal 
atom  in  the  source  program  we  create  a  controlling  process,  and  for  each  source  clause  potentially 
unifiable  with  this  atom  we  create  a  reduction  process  simulating  the  attempt  to  reduce  the  goal 
atom  with  the  clause.  The  reduction  process  operates  as  follows.  If  it  detects  that  it  cannot 
perform  the  simulated  goal/clause  reduction,  it  informs  the  controller.  If  it  receives  a  permission 
from  the  controller  to  reduce,  it  simulates  the  reduction. 

The  controlling  process  counts  the  number  of  clause  try  failures,  and  when  all  but  one  clause 
have  failed,  it  permits  the  remaining  one  to  try  and  reduce.  This  behavior  is  achieved  by  the 
following  translation31. 

Each  P-Prolog*  clause  A  *-  G\B  is  translated  into  an  FCPfl)  procedure  with  three  clauses. 
The  purpose  of  the  first  clause  is  to  fail  as  soon  as  it  is  determined  that  the  goal  atom  does  not 
unify  with  the  head  of  the  source  clause  or  the  guard  fails.  It  can  never  succeed.  The  second  clause 

30  Although  m  ad  hoc  extension  to  allow  this  «a a  proposed  [210].  Another  extension,  ALPS,  ie  diecuaeed  in  the 
next  eection. 

31  This  translation  wee  developed  in  collaboration  with  M.  Maher,  and  benefited  from  comment*  by  V.A.  Saraawat. 


inform*  the  controller  if  the  first  clause  has  failed.  The  third  clause  reduces  if  permission  is  given 
from  the  controller. 

Specifically,  let  C\,Ot,.  .  ,,C*  be  the  clauses  of  the  P- Prolog*  procedure  p/n.  It  is  translated 
into  k+1  FCP(|)  procedures,  p/a,  pi/a+8,  pj/a+2,  . . .,  fk/a+2,  which  use  two  auxiliary  pro¬ 
cedures,  at  follows.  The  i'*  clause  p( Ty.Tg,. . .,  7W)  <—  G\B  of  the  P-Prolog,  procedure  p/n  is 
translated  into  the  FCP(|)  procedure: 

p,(Tj  ,Tj,. .  .,Tn,— ,foo)  *—  G  |  true. 

Pi(_,_,. . Failed,-)  <—  otherwise  |  Failedsfailed. 

Pi(Xi,X2 . X„,go,-)  -  G  |  (X„X, . X„)=(T„Tj . Tn),  B. 

The  FCP(|)  procedure  p/n  is  defined  as  follows: 

P(Xl,X, . X„)  *— 

Pi(Xi,X2 . Xn,S,,-), 

Pi(Xi,x2 . x„,Si,_), 

P*(Xi,Xj . X«,S*,_), 

*°r*(Si,Si,...,S*), 

where  toi\  is  defined  as  follows: 

xors(Go, failed, failed,. .  /ailed) «—  Go=go. 
xotk(failed,Go, failed,. .  /ailed)  <—  Go=go. 

xor*(failed, failed . failed, Go)  «—  Go=go. 

with  Go  on  the  diagonal,  and  foiled  anywhere  else. 

The  translated  program  operates  as  follows.  The  procedure  p/n  spawns  t  parallel  clause 
processes  p,,  one  for  each  of  the  original  p/n  clauses,  plus  a  zork  process.  If  the  i,k  clause  process 
fails  it  unifies  the  S;  variable  with  failed.  Tb“  rorfc  process  counts  t-1  failures,  and  unifies  go  with 
remaining  variable,  which  enables  the  remaining  clause  process  to  reduce  if  it  has  not  failed  yet 
Note  that  the  FCP(j)  program  fails  whenever  the  source  P-Prolog,  program  fails. 

The  translation  assumes  that  the  unification  implied  by  P-Prolog, ’s  Reduce  transition  is 
atomic.  If  it  is  non-atomic,  then  the  exact  same  embedding  can  be  used  with  FGHC,.,  or  FGEC„»» 
as  the  target  language,  depending  on  the  kind  of  nonatomicity  allowed. 


13.  ALPS  —  An  Integration  of  P-Prolog,  and  FGHC 

ALPS  was  proposed  by  Maher  [124]  as  an  algorithmic  concurrent  logic  programming  language 
ALPS  goal  reduction  rule  states  that  a  goal  can  be  reduced  with  a  clause  if  either  this  is  the  only 
candidate  clause  left  (the  P-Prolog,  rule),  or  the  reduction  does  not  instantiate  variables  of  the 
goal  (the  FGHC  and  FCP(()  rule). 

In  particular,  the  FG  HC  unification  primitive  is  definable  in  ALPS  uaing  the  single  unit  clause: 
X  =  X 

The  reduction  of  the  goal  Ty  =  7j  with  this  clause  is  enabled  if  Ty  and  Tj  are  unifiable,  using 
the  P-Prolog  rule,  since  this  is  the  only  candidate  clause.  Unlike  FCP(|),  and  like  FGHC,  the 
unification  specified  by  such  s  goal  need  not  be  carried  out  atomically.  In  particular,  the  transition 
system  of  ALPS  defined  by  Maher  realises  non-atomic  variables,  at  in  FGHC,,,. 

ALPS  was  defined  in  the  general  setting  of  constraint  logic  programming  [92];  we  address  this 
aspect  of  the  language  in  Section  21. 
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Embedding  FGHCnav  and  P-Prolog,  in  ALPS 

FGHCoav  can  be  embedded  in  ALPS  using  a  compiler  that  duplicates  each  clause,  and  the  identity 
viewer.  Clause  duplication  prevents  the  resulting  ALPS  program  from  "eagerly”  reducing  using 
the  determinacy  rule,  since  no  goal  is  ever  determinate22.  P- Prolog*  can  be  embedded  using  the 
embedding  into  FCP(|),  shown  in  the  previous  section,  assuming  unification  need  not  be  carried  out 
atomically.  ALPS  can  be  embedded  in  FGHC  much  the  same  way  that  P-Prolog*  was  embedded 
in  FCP(|). 

Discussion 

The  transition  rules  of  ALPS  are  more  ‘eager'  than  those  of  FGHC.  This  means  that  some  programs 
which  deadlock  as  FGHC  programs  may  proceed  as  ALPS  programs.  The  practical  implications 
of  this  difference  are  yet  be  determined.  The  benefits  in  terms  of  added  expressiveness  are  un¬ 
clear,  and  the  comment  on  P-Prolog*  apply  here  as  well.  In  addition,  the  difficulties  in  efficient 
implementation  of  the  ALPS  language,  compared  with  FGHC,  seem  substantial. 


14.  FCP(:)  —  FCP({)  Extended  With  Atomic  Test  Unification 

In  FCP(|),  FGHC  and  Flat  PARLOG,  a  program  can  perform  only  matching  prior  to  clause 
selection.  In  the  next  set  of  languages  shown,  FCP(:),  FCP(?),  and  FCP(:,?)23,  a  program  can 
perform  unification  as  part  of  the  test  for  clause  selection,  prior  to  commitment.  If  the  unification 
fails,  it  should  leave  no  trace  of  its  attempted  execution;  in  other  words,  the  unification  attempt 
should  be  atomic.  We  call  unification  which  is  tried  before  commit  atomic  test  unification.  In 
FCP(|),  atomic  unification  is  a  special  predicate.  Id  FCP(.)  and  FCP(:,?)  it  is  definable,  and  in 
this  sense  these  languages  are  natural  generalisations  of  FCPfl). 

The  first  flat  language  to  combine  input  matching  and  atomic  test  unification  is  Saraswat’s 
FCP(1,|)  [150,154].  This  idea  was  generalized  by  Saraswat  in  the  Ank-and-Tell  framework  [156), 
which  gave  rise  to  the  languages  ce(i,|)  [156,157]  and  the  similar  language  FCP(:)  [100]  described 
below. 

14.1  The  language  FCP(r) 

Syntax 

Definition:  FCP(:)  clause  and  program. 

•  An  FCP(:)  clause  has  the  form: 

A  *-  Ask  :  Tell  \  Body. 

where  Ask  and  Tell  are  possibly  empty  conjunctions  of  atoms,  Ask  atoms  have  guard  test 
predicates,  and  Tell  contains  only  equality  atoms.  If  Tell  is  empty,  the  colon  is  omitted. 

•  An  FCP(:)  program  is  a  sequence  of  FCP(:)  clauses.  | 

The  effect  of  a  clause  try  of  a  goal  A  with  an  FCP(:)  clause  with  an  empty  tell  part  is  the  same 
as  in  FCP(|).  If  the  tell  part  is  not  empty,  the  effect  is  as  follows.  First,  the  goal /head  input 
matching  and  the  guard  checking  are  performed.  If  they  fail  or  suspend,  the  clause  try  fails  or 
suspends,  respectively.  If  they  succeed,  then  the  unification  specified  by  the  tell  is  performed, 
which  can  either  succeed  or  fail,  but  not  suspend.  If  it  succeeds,  the  result  of  the  clause  try  is 

22  The  dune  duplication  method  is  due  to  M.  Maher. 

23  More  precise  but  also  more  cumbersome  names  for  there  lanfna§ee  are,  respectively,  FCP(:,[),  FCP(?,|)  and 
FCP(:,?,|). 
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the  substitution  combining  the  ask  substitution  and  the  tell  substitution.  If  it  fails,  the  clause  try 
fails. 

Definition:  Try  function  for  FCP(:). 

•  Let  Tell  ~  (  Ai = Yi ,. .  -,Xn=  Yn)  be  a  conjunction  of  equality  atoms.  We  define  mgu(Tell)  = 
mgn((X\,. .  «iX»)i(yi9.  •  •  >»)),  and  the  try  function  to  be: 

IOo?  if  nutc^AfA1)  sM  checking  AakB  succeeds 
A  mgu(TellO)  =  $* 
fail  if  mp«(A,A')  =  fail  V 

mga(A,A*)  =  0  A  checking  AakB  fails  V 
match(A,Af)  =  $  A  checking  AakB  succeeds 
A  mgn(TellB)  =  fail 
tuapcnd  otherwise 

Embedding  of  FCPfflin  FCPQ) 

*The  embedding  of  FCP(|)  in  FCP(:)  is  trivial.  All  the  compiler  does  is  to  replace  the  unifica¬ 
tion  clause  X  =  X  by  the  clause 

X=Y  «—  true  :  X=Y  |  true. 

This  clause  is  necessary  since  '=’  is  a  primitive  in  FCP(|)  but  not  in  FCP(:). 

14.2  Programming  in  FCP(.) 

Atomic  test  unification  enables  numerous  programming  techniques  not  available  in  any  of  the 
weaker  languages  introduced  so  far.  These  include  multiple  writers  on  shared  variables,  which 
can  be  used  to  realize  sophisticated  synchronization  protocols  and  blackboard-like  shared  data 
structures;  the  ability  to  reflect  on  failure  of  unification,  which  enables  the  construction  of  failsafe 
meta-interpreters  that  can  be  used  to  realize  the  control  meta-call;  the  ability  to  record  the  logical 
time  in  which  a  unification  occurs,  which  is  essential  for  computation  replay  and  hence  essential 
to  concurrent  algorithmic  debugging;  and  the  ability  to  simulate  Prolog’s  test  unification,  and 
hence  the  ability  to  naturally  embed  Or- parallel  Prolog  and  similar  languages.  We  discuss  these 
techniques  below. 

Mutual  exclusion  and  multiple-writer  streams 

Using  atomic  test  unification,  single-round  mutual  exclusion  can  be  achieved  with  leas  machinery 
than  needed  in  FCP(|).  Let  pi,. .  .,pn  be  the  processes  wishing  to  participate  in  a  single-round 
mutual  exclusion  protocol,  with  unique  identifiers  /*,. .  -,/n  Add  to  each  process  an  argument,  and 
initialize  all  processes  with  this  argument  being  the  variable  ME.  Each  process  p*  competing  for  a 
lock  attempts  nondeterministically  unify  its  identifier  /*  with  ME,  or  to  check  that  ME  is  already 
instantiated  to  some  /  ^  /*. 

A  schematic  description  of  each  process  is  as  follows.  The  kth  process  call  L*  p{ME,l *,. . .). 

p(ME,I,. . .)  ♦—  true  :  ME=I  |  . . .  lock  granted  . . . 
p(ME,I,. . .)  -  ME  ^  I  |  . . .  lock  dented  . . . 

This  technique  is  not  a  substitute  to  the  multi  pie- round  mutual  exclusion  protocol  shown  in  Section 
7.  However,  in  the  special  case  that  in  each  round  the  number  of  competing  processes  decreases 
by  one,  it  can  be  generalized,  as  follows. 

Assume  a  set  of  processes  pi,...,p*,  where  each  p,  may  wish  to  deposit  a  message  m,  on 
a  shared  stream  Afs.  Furthermore,  assume  that  the  messages  are  pairwise  not  unifiable.  One 
solution  is  to  create  a  merge  network  for  all  these  processes.  However,  if  the  number  of  processes 
actually  wishing  to  deposit  their  message  on  the  stream  is  much  smaller  than  k  (as  is  the  case  with 
exceptional  message  streams),  this  solution  is  very  wasteful.  A  more  efficient  solution  in  this  case  is 
to  extend  the  single-round  mutual  exclusion  protocol  above  to  streams,  as  follows.  When  wishing 


-  65  - 


to  deposit  a  message  on  Afs,  the  process  nondeterministically  attempts  to  do  so,  or  to  check  that 
another  message  is  already  there.  In  the  second  case  it  calls  itself  recursively  with  the  tail  of  the 
stream.  Assume  each  process  Pi  is  called  with  Mb  as  its  first  argument  and  m*  as  its  second,  the 
code  of  a  process  is  as  follows: 

p(Ms,M,. . .)  «—  true  :  Ms=[M|_]  |  .  meassye  sent;  do  other  thing*. .  - 

p(Ma,M,. . .)  —  Ms=HMsQ  |  p(Ms/,Ml. . .). 

Using  this  protocol,  if  the  number  of  messages  to  be  placed  on  Mb  is  finite,  every  process  wishing 
to  place  a  message  on  Mb  will  eventually  do  so  (assuming  And-fairness). 

The  dining  philosophers 

The  seminal  problem  of  mutual  exclusion  is  that  of  the  dining  philosophers  [37].  In  this  problem 
n  philosophers  are  sitting  at  a  round  table,  with  one  fork  between  each  two  philosophers.  To  eat, 
a  philosopher  requires  two  forks.  Each  philosopher  goes  through  a  cycle  of  eating  and  thinking. 
The  problem  is  to  provide  the  philosophers  with  an  algorithm  that  guarantees  that  they  will  not 
deadlock,  and  that  no  philosopher  will  starve. 

Using  atomic  test  unification  on  multiple-writer  streams  it  is  easy  to  specify  a  deadlock-free 
behavior  for  philosopher: 

phi!(Id , [eating(LeftId  .done) | Left] , Right)  «~ 
phil(Id,  Left,  Right). 

phil(Id,Left,[eating(RightId,done)|Right])  «— 
phil(Id,  Left,  Right). 
phil(Id, Left, Right)  «— 

true  :  Left=[eating(Id,Done)|Left/], 

Right==[eating(Id,  Done)  (Right4)  | 

. . .  eat,  when  done  unify  Dono=donet 
then  think,  then  become: 
phU(Id,  Left',  RightO- 

The  program  is  independent  of  the  number  of  philosophers  dining.  A  dinner  of  n  philosophers  can 
be  specified  by  the  goal: 

phil(l,Forkl,Fork2),phil(2,Fork2,Fork3),. . . ,  phil(a,Forkn,Forkl). 

whose  execution  results  in  each  of  the  Fork  variables  being  incrementally  instantiated  to  a  stream 
of  terms  eating(Id,done)t  with  the  Id'*  on  each  Fork  reflecting  the  order  in  which  its  two  adjacent 
philosophers  use  it.  For  example,  a  partial  run  of  this  program  on  a  dinner  of  5  philosophers 
provided  the  substitution: 

Forkl  =  [eating(l,  done),  eating(5,  done),  eating(l,  done),  eating(5,  -)  |  _] 

Fork 2  =  [eating(l,  done),  eating(2,  done),  eating(l,  done),  eating(2,  _)  )  -] 

Fork3  =  [eating(3,  done),  eating(2,  done),  eating(3,  done),  eating(2,  -)  |  _] 

Fork4  =  [eating(3,  done),  eating(4,  done),  eating(4,  done),  eating(3,  done), 

eating(4,  done)  |  _] 

Fork5  =  [eating(4,  done),  eating(4,  done),  eating(5,  done),  eating(4,  done), 
eating(5,  -)  |  _] 

The  ran  *u  suspended  midstream  in  a  state  in  which  Forty  is  free  and  the  the  2nd  and  bth 
philosophers  are  eating.  Up  to  that  point  each  of  the  philosophers  ate  twice,  except  4  which  ate 
three  times. 

This  program  is  much  simpler  then  the  ParlogM  program  for  the  dining  philosophers  in  [145], 
The  key  to  its  simplicity  is  indeed  the  ability  of  FCP(:)  to  specify  atomic  test  unification:  a 


%  Left  is  eating,  wait  till 
%  he  is  done. 

%  Right  is  eating  wait  till 
%  he  is  done. 

%  Atomically  grab  both  forks 
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philosopher  etomicelly  trie*  to  grab  both  forks,  excluding  other  philoeophen  from  grabbing  them. 
The  mutual  exclusion  is  obtained  by  unifying  the  head  of  the  Fort  stream  with  a  term  containing 
the  unique  Id  of  the  philosopher. 

The  deadlock-freedom  of  the  program  is  guaranteed  by  the  language  semantics.  The  program 
can  be  further  enhanced  to  achieve  starvation  freedom  as  welt. 

The  duplex  stream  protocol 

Processes  placing  messages  on  a  shared  stream  need  not  be  competing;  they  can  also  cooperate, 
and  use  the  shared  stream  for  both  communication  and  tight  synchronisation. 

For  example,  consider  a  stream  producer  and  a  stream  consumer,  wishing  to  participate  in 
the  following  interaction.  When  the  consumer  reads  the  stream,  it  wants  to  read  all  the  messages 
produced  so  far  by  the  producer.  The  producer  produces  messages  asynchronously,  but  wishes  to 
know  whenever  all  messages  it  hat  produced  so  far  have  been  read.  This  can  be  achieved  using 
the  following  duplex  stream  protocol  [152].  The  producer  places  a  message  M  on  the  stream 
wrapped  as  write(M).  The  consumer,  when  reaching  the  end  of  the  stream,  places  on  it  a  read 
message.  From  the  consumer’s  point  of  view,  successfully  placing  a  read  on  the  stream  indicates 
that  it  has  read  all  messages  produced  so  far.  From  the  producer’s  point  of  view,  failing  to  place 
a  write(M)  message,  due  to  the  existence  of  a  read  message,  is  an  indication  that  all  previous 
messages  have  been  read.  This  is  realised  by  the  following  code,  where  produce(M, Ms,  Ms', Status) 
places  the  message  M  on  Ms,  returning  the  remaining  stream  Ms’,  and  Siatus=new  if  all  messages 
previous  to  M  have  been  already  read,  Siatus=old  otherwise.  cons%me(Ms,Ms’  ,Rs)  returns  in  Rs 
the  messages  ready  in  Ms,  and  in  Ms'  the  remaining  stream. 

produce(M, Ms, Ms' .Status)  *—  true  :  Ms  =  [write(M)|Ms']  |  Status=old. 
produce(M,[read|Ms],Ms<, Status)  *-  Ms=[write(M)|Ms'],  Status=new. 

consume([M|Ms],Ms,,Rs)  •-  eonsume'([M|Ms],Ms',Rs). 

consume'(M»,Ms',Rs)  «—  true  :  Mo=[read|Ms']  |  Ra=[  ]. 
consume'([write(M)|Ms],Ms',Rs)  Rs=[M|Rs'],  consume'(Ms,Ms',Rs'). 
consume  is  two-staged  so  that  it  would  not  place  a  read  message  on  an  initially  empty  stream. 

If  the  producer  waits  every  so  often  for  the  consumer  to  catch  up,  then  consume  always 
terminates. 

The  duplex  protocol  gives  rise  to  a  much  more  efficient  and  more  flexible  bounded-buffer 
protocol  than  the  FCP(|)  protocol  shown  in  Section  7.3.  It  is  more  efficient,  since  there  is  no 
acknowledgement  for  every  message,  only  one  per  ’batch’.  It  is  more  flexible,  since  the  producer  can 
change  its  mind  on  how  many  messages  to  send  without  an  acknowledgement,  without  consulting 
or  affecting  the  consumer,  and  with  no  need  to  change  ‘buffer-sise’. 

CSP  with  both  input  and  output  guards 

To  demonstrate  the  power  of  atomic  test  unification,  we  show  an  FCP(:)  simulation  of  CSP  with 
output  guards  [87],  CSP  with  output  guards  is  notoriously  difficult  to  implement,  and  hence  Occam 
[91],  the  practical  realisation  of  CSP,  adopts  only  input  guards.  It  is  interesting  to  note  that  a 
logic  programming  language  with  matching  is  sufficient  to  simulate  CSP  with  input  guards,  but 
a  language  with  both  matching  and  atomic  test  unification  seems  to  be  required  to  simulate  CSP 
with  both  input  and  output  guards. 

Consider  two  sets  of  processes  Pt,-,Pn,  ei,. ..,e»,  wishing  to  participate  in  the  following 
interaction.  Some  (possibly  all)  of  the  p;’s  wish  each  to  interact  with  exactly  one  of  the  c;’s,  but 
they  do  not  care  which.  Some  (possibly  all)  of  the  Cj’s  wish  each  to  interact  with  exactly  with  one 
of  the  p,  Y  but  they  do  not  care  which.  We  would  like  a  protocol,  which,  if  there  are  i  <  n  p’s  and 
j  <  n  c’s  willing  to  interact,  then  min(ij)  pairs  will  do  so.  The  protocol  should  be  independent 
of  i  and  j,  and  allow  i  and  J  to  increase  dynamically. 

The  protocol  is  as  follows  [148].  Each  p  willing  to  interact  sends  to  all  the  c’s  the  incomplete 
message  kcllo(X).  All  messages  sent  by  the  same  p  have  the  same  variable  X,  and  the  variables 
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in  mange*  *ent  by  different  p’s  an  distinct.  Each  c  willing  to  internet  doe*  the  following:  it 
nondeterminietically  end  atomically  eelecte  one  of  it*  incoming  kello(X)  me*eagee  and  unifies  X 
with  its  unique  Id. 

The  program  for  the  case  of  two  p’»  and  two  e’s  i*  as  follows: 
p(X,ToCl,ToC2)  -  ToCl=hello(X),  ToC2=hello(X). 

c(Id,hello<Xi),_) «-  true  :  Id=Xj  |  true. 
c(Id,_  ,helio(Xj))  <—  true  :  Id=Xj  |  true. 

The  initial  process  network  is: 

p(Xi,Mn,M12),  P(Xj,Mji,M3j).  c(a,Mn,Mji),  c(b,Mn,Mjj). 

This  process  network  terminate*,  and  at  the  end  of  it*  execution  exactly  one  of  Xi  and  Xj  will  be 
instantiated  to  *,  and  the  other  h. 

In  this  example  the  two  p’s  and  two  c’s  were  both  willing  to  interact.  However,  the  definition 
of  p  and  c  is  applicable  also  in  the  more  general  case,  in  which  less  are  willing  to  interact  on  each 
side,  or  that  processes  are  added  dynamically. 

This  demonstration  of  the  power  of  atomic  teat  unification  also  indicates  that  the  distributed 
implementation  of  atomic  test  unification  ia  far  from  being  trivial.  It  is  discussed  in  Section  20. 

Othtnoite  and  reflection  on  failure 

In  FCP(|)  it  is  possible  to  prevent  failure  of  user-defined  processes,  by  appending  to  each  procedure 
p  the  clause: 

p(. . .)  «—  otherwise  |  . . .  report  failure  . . . 

However,  there  is  no  way  to  prevent  the  failure  of  the  primitive  unification  process  '=’. 

In  FCP(:),  on  the  other  hand,  since  unification  is  definable,  it  is  possible  also  to  define  failsafe 
unification  using  the  clauses: 

X  =  Y  —  true  .  X=Y  |  true. 

X  =  Y  «—  X  jf  Y  |  .  • .  report  ftilurt  of  unification  . . . 

More  generally,  it  is  possible  to  define  a  failsafe  FCP(:)  meta-interpreter,  which,  instead  of  failing 
when  the  interpreted  program  fails,  simply  report*  the  failure.  To  achieve  this  we  modify  the  clause 
representation  of  the  interpreted  program,  by  appending  to  it  the  clause: 

clause(A,B)  —  otherwise  |  B  =  failed(A). 

Using  this  representation,  a  termination  detecting  failsafe  roeta-interpreter  for  FCP(:)  is  defined 
as  follows: 

reduce(A, Result)  •— 

reduce'(A,(  ]-  Result). 

reduce'(true.L-R)  <—  l  =  R. 

reduce'((A,B),L-R)  •-  reduce(A,I,-M),  reduce(B.M-R). 

reduce* (failed(A),L-R)  -  R=[failed(A)|L). 

reduce' (goal(A).L-R) «—  clau»e(A,B),  reduce'(B.L-R). 

On  a  call  red«c«(A,ffe»*fl),  RttuH  is  instantiated  to  the  (possibly  empty)  stream  of  goals  failed 
during  the  com,  station.  The  stream  is  closed  when  the  computation  terminates. 

14.3  Embedding  KL1  and  Flat  PARLOG  in  FCP(:) 

The  inability  to  reflect  on  failure  without  reifying  unification  made  all  the  previous  languages  unable 
to  implement  the  control  meta-call  efficiently.  Therefore  to  make  them  practical  this  construct  has 
to  be  introduced  as  a  primitive  into  the  language  as  discussed  in  Section  10.3. 
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We  (how  how  the  control  meta-call  can  be  implemented  in  FCP(:),  and  thus  provide  an 
embedding  of  KL1  in  FCP(:).  Combined  with  the  techniques  used  to  embed  FP(;)  and  FP(&)  in 
FGHC„,  discussed  in  Section  11,  the  implementation  of  the  control  meta-call  can  be  enhanced  to 
provide  an  embedding  of  Flat  PARLOG  in  FCP(:). 

An  implementation  of  the  control  meta-call  in  FCP(:) 

'the  meta-call  implementation  eoiuists  of  two  components:  a  meta- interpreter,  which  can  produce 
events  and  is  sensitive  to  interrupts,  and  a  computation  monitor,  which  provides  the  user  interface. 

The  meta-interpreter  requires  the  same  clause  representation  of  the  FCPQ)  interruptible 
meta-interpreter  shown  in  Section  7.7,  augmented  with  the  otherwise  rlinsr  shown  above  and 
an  interrupt-sensitive  clause.  Each  FCP(:)  clause  (including  the  unification  danse) 

A  —  Ask  :  TeU  |  B. 
is  translated  into: 

c]ause(A,X,Is)  —  Ask  :  TeU  )  X=B' 

where  B'  is  B  transformed  se  in  previous  meta-interpreters,  sad  two  rlsnses  are  appended: 

clause(A,B,Ia)  «-  otherwise  |  B=failed(A). 
clause(A,B,[I|Ia])  «-  A=B. 

The  fist  reports  failure  of  a  reduction  attempt.  The  second  aborts  the  attempt  when  ahg  an 
interrupt.  Note  that  the  order  of  the  last  two  rlauaia  is  important:  if  they  were  switched,  then 
the  meta-level  process  cl*tc(A,B,Ie)  executing  a  failing  object-level  process  A  will  impend  on 
the  interrupt  stream  le  rather  then  reporting  failure.  This  is  another  demonstration  of  both  the 
subtlety  and  power  of  otherwise. 

Using  this  representation,  the  following  meta- interpreter  achieves  the  desired  Itehrrirs 
reduce(true,ls,Ss,L-R)  «—  L=R. 

reduce((A,B),Is,Se,L-R)  «—  reduce(AJb,Ss,L-M),  reducefBJs-Ss.M-R) 
reduce(go*l(A),ls,Ss,L-R)  «- 

clauae(A,B,!a),  reduce(B,I*,Se,L-R). 
reduce(fsiled(A),Is,Ss,L-R)  «— 
write(failed(A),Sa),  L=R. 
reduce(A,[I|Is],Ss,L-R)  *— 

serveJnterrupt([I|Is],A,Ss,L-R). 

write(M,Ms)  •—  true  :  Ms=[M|_]  |  true. 
write(M,[M|Ms])  <—  write(M,Ma). 

The  differences  between  this  and  the  snapshot  meta- interpreter  of  FCP(|)  shown  in  Section  7.7 
above  are  the  additional  signals  stream  St,  the  dauae  added  for  handling  process  failure  and  the 
lack  of  a  special  clause  for  unification.  The  latter  is  not  needed  since  the  danse  defining  '=' 
is  a  normal  FCP(:)  clause  which  requires  no  special  treatment.  Failure  is  handled  by  placing 
an  appropriate  message  on  the  signals  stream,  using  the  multiple-writer  stream  protocol.  This 
is  an  example  where  creating  a  merger  for  each  forked  process  would  have  had  an  unacceptable 
overhead.  Assuming  either  low  rate  of  process  failure,  or  that  the  computation  is  tmpended  by  the 
controller  as  soon  as  failure  is  detected,  the  multiple-writer  protocol  would  exhibit  a  much  better 
performance. 

Note  that  if  two  uniflable  processes  fail,  only  one  mem  age  is  produced  on  the  signals  stream. 
This  oddity  can  be  aolved  either  by  allocating  unique  identifiers  to  the  meta-interpreter  proceesee 
(which  is  inelegant  and  quite  expenaive),  or,  in  FCP(:,7),  using  the  anonymous  mutual  exdusion 
protocol,  discussed  in  Section  16. 

An  alternative  solution  which  does  not  use  s  multiple-writer  stream  is  to  use  the  short-circuit  in 
order  to  report  failure,  as  in  the  failsafe  FCP(:)  meta-interpreter  shown  above.  The  d  wad  vantage  of 
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this  approach  is  that  tbs  list  of  failed  goals  will  be  seen  only  upon  termination  of  the  computation. 
A  computation  monitor,  which  suspends  the  computation  as  soon  as  a  failed  goal  is  sensed,  cannot 
be  programmed  using  this  technique. 

The  definition  of  the  computation  monitor  should  be  quite  obvious  now.  Its  top  level  is  the 
same  as  the  meta  call,  eall(Go*l,Siinnti,Event*).  It  invokes  the  meta- interpreter,  keeping  hold  of 
the  ends  of  its  short  circuit  streams.  It  serves  signals  coming  from  the  outside  by  forwarding  them 
to  the  meta-interpreter,  via  the  interrupt  stream,  and  reports  on  events  that  happen  during  the 
computation  by  placing  them  on  the  Event*  stream. 

The  meta-interpreter  given  serves  as  a  specification  of  the  required  functionality  of  the  control 
meta-call.  This  functionality  can  be  implemented  by  source  to  source  transformation.  The  trans¬ 
formation  presently  employed  in  the  Login  system  [84]  which  achieves  this  functionality  results  in 
about  30%  increase  in  runtime  and  80%  increase  in  code  site.  In  [46],  Foster  reports  an  experi¬ 
mental  study  that  quantifies  the  cost  of  direct  support  for  metacontrol  functions,  and  compares 
this  with  the  cost  of  support  by  program  transformation.  The  same  paper  describes  extensions  to 
an  existing  abstract  machine  [49]  required  to  support  these  functions.  This  study  indicates  that 
direct  support  for  the  control  meta-call  need  not  be  expensive,  nor  require  complex  implementation 
mechanisms. 

Discussion:  atomic  test  unification  vs.  non-atomic  unification 

It  is  a  subject  of  ongoing  debate  whether  it  is  preferable  to  have  a  stronger  language  which  can 
embed  meta-level  functions  such  as  the  control  meta-call,  or  to  have  a  weaker  language  and  provide 
specific  meta-level  functions  as  language  extensions. 

The  issue  seems  to  be  a  tradeoff  between  simplicity  at  the  implementation  level  versus  elegance 
and  expressiveness  at  the  language  level.  On  one  side  of  the  debate  are  Flat  GHC  and  Flat 
PARLOG,  with  non-atomic  unification.  On  the  other  side  are  FCP(i,|),  FCP(:),  FCP(?),  and 
FCP(:,?),  languages,  with  atomic  test-unification. 

The  main  arguments  for  a  weaker  language,  with  non-atomic  unification  and  a  built-in  control 
met  a- call,  are: 

s  The  base  language  is  simpler  to  implement; 

s  The  specialised  meta-level  construct  can  be  added  with  less  overhead  than  via  a  general- 
purpose  language  mechanism. 

•  The  base  language  has  simpler  formal  semantics,  and  is  therefore  better  amenable  to  theoret¬ 
ical  treatment  such  as  verification  and  transformation. 

•  Atomicity  of  unification  is  not  assumed  by  the  theory  of  (pure)  logic  programming.  Therefore, 
it  ia  important  to  write  programs  without  relying  on  atomic  unification  whenever  possible,  and 
a  language  with  non-atomic  unification  encourages  it.  The  resulting  programs  allow  better 
declarative  reading34. 

The  main  arguments  for  a  stronger  language,  which  has  atomic  teat  unification  and  can  im¬ 
plement  meta-level  constructs  via  interpretation  and  transformation  are: 

s  Providing  semantics  for  any  specific  meta-level  construct  as  part  of  the  base  language  is  both 
complicated  and  *i  hoe  (we  know  of  no  formal  semantics  for  the  control  meta-call  or  similar 
constructs,  other  then  the  one  implied  by  the  semantics  of  FCP(:)  combined  with  the  definition 
of  the  control  meta-call). 

s  The  need  for  stronger  meta-level  constructs  is  continuously  evolving  (e.g.  live  and  frosen 
snapshots,  sophisticated  debuggers,  etc.  which  are  not  provided  by  the  control  meta-call). 
If  these  needs  are  met  at  the  language  definition  level,  rather  than  by  interpretation  and 
transformation,  the  language  semantics  as  well  as  implementation  have  Co  be  continuously 
modified. 

s  When  atomic  test  unification  is  not  employed,  there  is  little  or  no  runtime  penalty  compared 


34  Those  last  two  point*  wore  corwwinhstod  by  K.  Hods. 
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to  implementations  of  the  weaker  languages. 

•  Should  the  efficiency  of  a  direct  implementation  of  a  certain  meta-level  function  be  required, 
it  can  be  provided  without  affecting  the  language  semantics.  Such  a  direct  implementation 
can  be  viewed  as  a  (possibly  hand-coded)  specialisation  of  a  function  that  could  be  provided 
by  the  language  itself. 

•  There  are  other  applications  in  which  the  added  strength  of  atomic  test  unification  is  employed, 
such  as  embedding  Or-parallel  Prolog  [165],  and  debugging  (see  below). 

•  It  is  not  obvious  at  present  that  the  semantics  of  the  weaker  languages  is  indeed  simpler. 

Recently,  Saraswat  has  proposed  combining  both  atomic  test  unification  and  non- atomic  uni¬ 
fication  in  a  single  language  [157].  Such  a  language  inherits  the  complexities  of  both  approaches, 
and  it  is  not  clear  at  present  what  performance  gains  it  allows. 

14.4  Computation  replay  and  debugging 

One  type  of  bug  which  is  most  difficult  to  diagnose  in  concurrent  programs  are  transient,  or  lurk- 
ing,  bugs.  Once  a  bug  occurs  in  a  sequential  deterministic  language,  it  is  possible  to  repeat  the 
computation  and  analyze  it  with  various  tools.  This  is  not  always  possible  in  a  concurrent  program, 
unless  special  measures  are  taken.  Specifically,  all  communication  and  all  nondeterministic  (sched¬ 
uler  and  program)  choices  made  during  a  computation  must  be  recorded,  so  that  if  an  erroneous 
behavior  is  observed,  the  computation  can  be  repeated. 

We  show  an  FCP(:)  met  a- interpreter  that  records  scheduler  and  program  (i.e.  And-  and  Or- 
nondeterminiatic)  choices  made  by  the  interpreted  program.  This  information  is  sufficient  in  order 
to  reconstruct  closed  (non-reactive)  computations,  in  which  all  communication  happens  internally. 
The  metar interpreter  computes  a  tree  data-structure  called  a  trace,  which  reflects  the  process 
reductions  occurred  in  the  computation.  Each  node  in  the  trace  contains  the  pair  (Time, Index) , 
with  the  time  in  which  the  process  in  that  node  reduced,  and  the  identity  of  the  danse  used  for 
reduction.  Given  an  initial  goal  and  a  trace  of  its  computation,  the  computation  can  be  repeated 
by  redoing  the  process  reductions  specified  by  the  trace  in  the  order  specified  by  the  Time  field  of 
each  node,  and  for  each  reduction  selecting  the  clause  specified  by  the  Index  of  its  node. 

To  construct  such  a  trace,  we  assume  that  the  underlying  machine  maintains  logical  clocks 
[107],  and  that  the  language  provides  a  new  primitive,  lime  (Time),  which  unifies  Time  with  the 
present  value  of  the  local  logical  clock.  The  clause  representation  is  modified,  to  provide  additional 
information  on  the  clause  reduction:  the  logical  time  in  which  it  took  place,  and  the  identity  of 
the  clause  chosen25. 

The  ith  clause  A  «—  Aak  :  Tel!  \  B  of  the  program  is  transformed  into  the  clause: 
clause(A,X, Index, Time)  «—  Ask  :  Tell,  time(Time)  |  X  =  B',  Index  =  i. 

Using  this  representation,  a  meta-interpreter  that  constructs  a  trace  is  defined  as  follows: 
reduce(  true,  true). 

reduce((A,B),T)  —  T=(T1,T2),  reduce(A,Tl),  reduce(B,T2). 
reduce(goal(A),T)  ♦-  T— trace(  Index  .Time,  SubTr  ace), 
clause(A,B, Index, Time),  red uce(B,SubTrace). 

A  computation  reconstructor,  which  repeats  a  computation  given  an  initial  goal  and  a  trace,  can 
be  written  quite  elegantly  using  incomplete  data-structures.  It  first  serialises  the  trace  using  the 
Time  field,  then  executes  the  reductions  in  order,  one  by  one.  We  do  not  show  it  here. 

Given  the  ability  to  reconstruct  a  computation,  algorithmic  debugging  techniques  [159]  can 
be  applied  to  concurrent  programs  as  well.  See  [176,118,120]  for  details. 

25  Inability  to  record  the  time  in  which  *  unification  occurs  b  what  prevents  the  weaker  languages  shown  from 
replaying  computations. 
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14.5  An  embedding  of  Or-parallel  Prolog  in  FCP(:) 

The  question  of  how  to  provide  the  capabilities  of  Prolog  in  a  concurrent  logic  programming 
languages  has  received  considerable  attention  since  the  beginning. 

One  approach  was  pursued  by  PARLOG  [24,25,86],  namely  to  provide  two  sublanguages  with 
an  interface:  the  single-solution  sublanguage,  which  is  the  counterpart  of  other  concurrent  logic 
programming  languages,  and  the  all-eolutioaa  sublanguage,  which  is  essentially  an  all-solutions 
Or-parallel  Prolog.  A  stream-like  interface  allows  single-solution  programs  to  invoke  and  control 
all-solution  programs. 

Another  approach  was  to  embed  Prolog  in  a  concurrent  logic  language.  The  first  success  in 
this  direction  wss  Kahn’s  Or-parallel  Prolog  interpreter  in  Concurrent  Prolog,  discussed  in  Section 
18.  However,  this  interpreter  relies  in  an  essential  way  on  the  non-flat  nature  of  Concurrent 
Prolog.  Initial  attempts  by  Ueda  [200,  201]  and  Codish  and  Shapiro  [29],  were  successful  in 
producing  efficient  translations  when  the  mode  of  unification  of  the  source  Prolog  program  could 
be  determined  at  compiler  time.  A  more  general,  but  leas  efficient,  solution  is  described  in  [165],  in 
the  form  of  an  Or-parallel  Prolog  interpreter  written  in  FCP(?),  a  language  introduced  in  Section 
15.  Although  originally  written  in  FCP(?),  the  interpreter  does  not  exploit  properties  of  it  not 
available  in  FCP(:),  and  can  be  easily  converted  to  this  language.  This  implementation  is  not 
as  direct  as  the  interpreter  in  Concurrent  Prolog,  but  is  still  quite  simple.  Furthermore,  if  the 
mode  of  subprograms  can  be  determined,  the  interpreter  can  be  gracefully  interfaced  to  programs 
implemented  using  the  transformations  proposed  by  Ueda.  The  execution  algorithm  employed  by 
the  interpreter  was  proposed  independently  fc:  other  purposes  [3,4,28];  nevertheless,  its  practicality 
is  still  under  debate. 

This  embedding  employs  atomic  test  unification  to  implement  Prolog’s  unification.  Hence,  un¬ 
like  the  digiointrsublanguages  approach,  os  the  mode-baaed  compilation,  which  is  applicable  to  any 
concurrent  logic  language,  the  embedding  approach  is  not  applicable  to  languages  such  as  FCPfl), 
(Flat)  GHC,  and  (Flat)  PARLOG.  Should  the  execution  algorithm  employed  by  the  embedded- 
language  approach  prove  efficient  in  practice,  its  advantage  over  the  disjoint-sublanguages  approach 
would  become  apparent,  especially  in  the  presence  of  specialized  hardware  for  the  execution  of  con¬ 
current  logic  languages  [5,74], 

A  variant  of  the  algorithm  can  be  implemented  also  in  Flat  PARLOG  or  KL1,  using  the  control 
meta-call.  However,  such  an  implementation  would  be  hopelessly  inefficient,  since  it  would  require 
a  new  meta-call  at  every  choice  point,  and  cannot  prune  alternatives  using  test  unification  as  done 
in  direct  implementations  of  Prolog  or  in  the  Prolog  interpreter  in  FCP(7). 

Another  approach,  pursued  by  Saraawat  [150,153,157]  and  Yang  and  Aiso  [209,210]  was  to 
incorporate  in  concurrent  logic  languages  don’t-know  nondeterminism.  As  the  resulting  languages 
cannot  specify  reactive  concurrent  systems,  it  is  not  an  extension  or  a  substitute  for  concurrent 
logic  languages.  Assuming  that  an  underlying  reactive  concurrent  logic  language  is  still  desired, 
the  problem  of  integrating  if  with  a  parallel  don’t-know  nondeterministie  logic  languages  is  much 
the  same  as  that  of  integrating  Prolog:  it  can  either  be  implemented  separately,  with  some  all¬ 
solutions  interface,  as  in  the  two  sublanguages  approach,  or  it  can  be  compiled  into  a  concurrent 
logic  language,  as  in  the  embedding  approach.  This  is  discussed  further  in  Section  21. 


15.  FCP(?)  —  Dynamic  Synchronization  With  Read-Only  Variables 

The  language  Concurrent  Prolog  [160]  introduced  a  different  approach  to  synchronisation,  using 
read-only  variables  and  read-only  unification.  The  approach  is  preserved  in  its  fiat  subset  Flat 
Concurrent  Prolog  [128],  also  known  as  FCP,  and  called  throughout  the  paper  FCP(?)  (read 
“FCP  read-only"). 

FCP(?)  assumes  two  types  of  variables,  writable  (ordinary)  variables,  and  read-only  variables, 
and  uses  read-only  unification,  which  is  an  extention  of  ordinary  unification,  to  unify  terms  con- 
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tawing  read-only  variable*.  The  read-only  ape  refer,  ?,  is  a  mapping  from  writable  to  read-only 
variable*.  When  applied  to  a  writable  variable  X,  the  read-only  operator  yield*  a  corresponding 
read-only  variable  X? .  The  read-only  operator  is  the  identity  function  on  terms  other  than  writable 
variable*. 

In  the  absence  of  read-only  variable*  read-only  unification  is  just  like  ordinary  unification. 
However,  a  read-only  variable  Xf  cannot  be  unified  with  a  value.  An  attempt  to  unify  A?  with 
a  term  other  than  a  writable  variable  suspend*.  When  the  writable  variable  X  is  instantiated  to 
some  value  T  (by  some  concurrent  unification)  it*  corresponding  read-only  variable  X?  receives 
the  value  Tt.  This  may  release  a  unification  suspended  in  an  attempt  to  unify  X?  with  some 
value. 

Whereas  synchronisation  with  matching  is  specified  clause-wise  and  statically,  synchronisation 
with  read-only  unification  is  specified  term-wise  and  dynamically.  Read-only  unification  can  be 
used  to  achieve  various  forma  of  dynamic  synchronisation,  not  acheivable  otherwise. 

15.1  The  language 
Syntax 

The  syntax  of  FCP(?)  is  the  same  as  FCP(|),  except  that  a  clause  may  contain  read-only  variables. 
Semantics 

The  semantics  of  the  language  is  similar  to  FCP(|),  except  that  goals  may  contain  read-only 
variables,  and  the  goal  and  the  clause  head  are  unified  using  read-only  unification  instead  of 
matching. 

Definition:  Admissible  substitution,  read-only  extension,  read-only  mgu. 

•  A  substitution  9  is  admissible,  if  X99—X9  for  every  variable  X. 

a  The  read-only  extension  of  an  admissible  substitution  9  is  the  unique  iderapotent  substitution 
9s  satisfying  X(0s)  =  X9  and  X?9y  =  (X9)f  for  every  writable  variable  X. 
a  The  read-only  mgu,  mgus,  of  two  term*  T\  and  Tj  is  defined  by: 

{9s  if  mj*(7\,7j)  =  9,9  admissible 

fail  if  mgn(Ti,T-i)  =  fail  I 

suspend  otherwise 

For  example,  is  admissible  but  {Xfr~a}  and  {Ai-»*,Jfft->*}  are  not.  The  read¬ 

only  extension  of  (X>->a,  YtsZ)  is  {X>->a,.X?>-»a, Yi-*Z, Yft—Z9).  mgus(f(X, Y)J(a,Z))  = 
{(X>—a,X?i-+a,Yi-sZ,Y?>->Z?},  and  both  mgus(X?,a),  mgus(J(X,X?),  /(«,«)),  and  mgus(f(X,X?), 
/(«,&))  =  suspend 

The  try  function  of  FCP(?)  is  the  same  as  that  of  FCP(|),  except  that  it  uses  my us  instead  of 
match,  and  it  returns  suspend  if  the  read-only  unification  of  the  goal  and  the  head  is  inadmissible 
due  to  read-only  goal  variable,  and  fail  if  it  fails  or  is  inadmissible  due  to  a  read-only  clause  variable 
only  (since  the  latter  state  is  stable). 

15.2  FCP(?)  programming  technique* 

Standard  programming  technique* 

All  standard  programming  techniques  shown  for  FCP(|)  and  FCP(:)  are  realisable  also  in  FCP(?). 
However,  for  most  of  the  simple  synchronisation  tasks,  the  generality  and  the  dynamic  nature  of 
read-only  unification  turns  out  to  be  more  of  a  burden  than  an  asset.  Since  read-only  unification 
is  an  extension  of  unification,  using  it  for  goal/clause  unification  is  closer  to  the  original  model  of 

^  This  definition  at  read-ooly  unification  is  diflwent  from  the  original  one  [ISO],  hi  that  it  is  oedon  independent 
sod  disallows  "eeif-feeding'' ,  i.s.  tbs  success  at  /(X,Xf)s/(*,S).  The  revision  was  influenced  by  criticism  of 
the  earlier  definition  [162,197],  and  by  the  language  CP(%)  of  Ramakrishnan  and  Sllborechats  [163]. 
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logic  programming  mad  Prolog.  Nevertheless,  in  concurrent  logic  programming,  j,  used 

more  often  than  unification.  The  default  in  FCP(?)  encourages  programmers  to  use  unification 
even  when  matching  is  needed,  and  instead  restrict  the  use  of  the  procedure  by  placing  read-only 
variables  in  the  caller.  For  example,  consider  the  FCP(?)  procedure  append. 

append([X|Xs],Ys,[X|Zs])  —  append(Xs?,Ys,Zs). 

append([  ],Ys,Ys). 

The  procedure  is  almost  identical  to  the  logic  program  (and  Prolog  program)  append.  The  only 
difference  is  the  read-only  annotation  in  the  recursive  call.  Nevertheless,  this  program  has  awkward 
behavior.  Although  its  head  specifies  unification,  the  intention  is  that  the  first  argument  be 
matched.  The  program  ensures  this  for  recursive  calls,  but  not  for  the  initial  call.  If  the  initial  goal 
is  tpptni(X$,Yt,Zt)  rather  than  append  (As?,  Yi,Z«),  the  first  (or  second)  clause  can  be  chosen 
erroneously.  Placing  this  responsibility  on  the  caller  is  a  source  of  non-modularity  and  bugs.  In 
addition,  matching  can  be  compiled  more  efficiently  than  unification  [99].  Without  global  analysis, 
which  infers  that  the  caller  always  places  a  read-only  variable  in  the  appropriate  position  [30,186], 
an  FCP(?)  program  would  compile  less  efficiently  than  a  corresponding  program  in  a  language 
with  input  unification. 

A  later  definition  of  FCP  [170]  allowed  both  a  matching  predicate  =?=  and  unification  in  the 
guard.  Using  a  matching  guard,  the  recursive  clause  of  append  could  be  specified  as: 

append(Xs,Ys,[X|Zs])  —  Xs  =?=  [X|Xs']  |  append(Xs',Ys,Zs). 

However,  since  this  syntax  is  more  verbose  than  the  default  one,  programmers  would  still  use 
the  previous  style,  resulting  in  programs  which  are  both  more  error  prone  and  less  efficient.  In 
addition,  it  turned  out  to  be  difficult  to  define  cleanly  the  try  function  for  guards  which  contained 
a  free  mix  of  matching  and  unification  predicates  [173]. 

It  seems,  therefore,  that  the  approach  taken  by  the  other  flat  languages,  namely  to  use  match¬ 
ing  as  the  default,  is  better.  The  language  FCP(:,7),  discussed  in  the  next  section,  attempts  to 
unify  the  expressiveness  of  FCP(?)  with  the  more  convenient  and  efficient  programming  style  of 
the  other  languages. 

Tcst-wd-pct 

One  use  of  read-only  variables  is  to  implement  various  forms  of  a  test-and-set  operation.  A  variable 
can  be  tested  to  be  a  variable  and  then  set  to  a  non-variable  term  T  in  two  stages:  First  unify  it 
with  a  new  read-only  variable  Xf  and  if  successful  unify  T  with  X: 

test_and_set(X?,T)  —  X=T. 

The  definition  of  read-only  unification  implies  that  the  clause  try  will  succeed  with  the  goal 
fe*i-*ad_sef(.X,r)  if  and  only  if  A  is  a  variable  at  the  time  of  the  try.  The  technique  directly 
generalises  to  simultaneous  test-and-set  of  several  variables. 

The  ability  to  implement  test-and-set  implies  that  FCP(?)  is  not  success  stable.  For  example, 
tsst_snW_set(A,s)  succeeds  with  X  instantiated  to  s,  but  tctt.anijct(a,a)  fails. 

We  note  that  test-and-set  can  be  also  realised  in  FCP(:),  augmented  with  the  ear  guard 
primitive,  but  not  in  any  of  the  weaker  languages. 

Anonymous  mutual-exclusion,  multiple-writer  streams  and  distributed  queues 
The  ability  to  test-and-set  can  be  used  to  implement  anonymous  mutual  exclusion,  that  is,  mutual 
exclusion  without  unique  identifiers.  For  example,  a  multiple-writer  stream,  which  preserves  mes¬ 
sage  multiplicity  even  in  the  presence  of  unifiable  messages  (in  contrast  to  the  FCP(.)  program 
shown  in  Section  M  above)  can  be  defined  as  follows: 

write(M,(X?|MsJ,Ms)  *-  M=X. 

write(M,[_|Ms],Ma')  «-  write(M,Ms,Ms'). 
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The  third  argument  M*1  can  be  uaed  to  place  eubeequent  meaaagee  on  the  etream.  It  enauree  that 
the  next  meaeage  is  placed  after  the  previous  one,  so  a  writer  can  ensure  that  its  own  messages  are 
ordered.  Even  if  a  writer  placing  several  messages  on  a  stream  does  not  care  for  their  order,  he 
could  still  use  lit/  instead  of  U»  for  subsequent  messages,  to  increase  efficiency. 

Using  this  procedure,  placing  a  messages  by  a  writers  on  one  stream  requires  0(a2)  steps. 
By  introducing  a  special  abstract  data  type,  called  rnafsaf-re/ercace  [168],  the  three  argument 
wife  operation  specified  by  the  above  program  can  be  implemented  by  a  destructive  assignment 
so  that  the  cost  of  sending  a  messages  is  O(s).  The  implementation  is  also  ‘better’  than  the 
specification  in  another  respect:  assuming  And-fairness  it  guarantees  that  every  write  operation 
will  eventually  complete,  even  in  the  presence  of  an  unbounded  number  of  writers,  a  property 
not  guaranteed  by  the  program  above.  Mutual-references  are  the  standard  technique  for  realizing 
efficient  stream  mergers.  Whenever  we  use  a  multiplicity-preserving  multiple-writer  stream  in  a 
program  we  assume  it  is  implemented  efficiently  and  fairly  using  mutual-references. 

Another  application  of  anonymous  mutual  exclusion  is  a  distributed  queue  [165].  In  it,  client 
processes  are  at  the  leaves  and  queue  processes  are  at  the  internal  nodes  of  a  process  tree.  Each 
enf*eue(X,ME)  or  defueae(X,itr)  request  is  sent  up  the  tree  from  the  leaf  process  which  generated 
it,  with  X  carrying  the  element  to  a  queue  or  dequeue  and  ME  being  a  new  mutual  exclusion 
variable.  If  a  queue  process  at  a  node  can  satisfy  the  request  by  matching  it  with  a  locally  stored 
corresponding  request,  it  does  so.  Otherwise  it  keeps  a  copy  of  the  request  in  its  local  queue, 
and  also  sends  a  copy  of  it  to  its  parent.  A  request  is  matched  with  a  corresponding  request  by 
atomically  testing  the  ME  fields  of  the  two  requests  to  be  variables  and  setting  them  to  some 
value.  When  attempting  to  match  the  requests,  the  queue  process  also  nondeterministically  checks 
whether  the  ME  field  of  any  of  the  requests  has  been  set  by  another  queue  process;  this  indicates 
that  the  request  has  been  satisfied  by  some  other  queue,  and  so  it  is  discarded. 

Such  a  distributed  queue  can  be  used  for  dynamic  load  balancing,  where  workers  off-load 
work  by  enqueing,  and  request  work  by  dequeing  [191].  It  ia  very  suitable  for  this  application  since 
requests  are  satisfied  locally  whenever  possible,  but  eventually  get  to  the  most  global  queue  (the 
root  queue)  if  necessary. 

Protected  data-structurea 

Another  important  application  of  read-only  variables  ia  to  protect  processes  communicating  across 
trust  boundaries.  Consider  an  operating  system  process  interacting  with  a  possibly  faulty  user 
process  via  an  incomplete  message  protocol,  or  by  incrementally  producing  some  data  structure. 

If  the  user  process  does  not  obey  the  protocol,  and  instead  of  waiting  for  the  operating  system 
process  to  instantiate  some  variable  it  instantiates  this  variable  to  some  erroneous  value,  it  may 
cause  the  operating  system  process  to  fail. 

Several  proposals  were  made  to  solve  this  problem.  One  is  to  restrict  the  type  of  communica¬ 
tion  protocols  allowed  between  user  processes  and  system  processes,  and  provide  user  processes  only 
with  complete  data-structures,  with  no  ‘holes'  to  mess  with.  This  solution  greatly  decreases  the 
flexibility  of  the  interaction,  and  puts  a  heavy  synchronisation  and  termination  detection  burden 
on  the  operating  system. 

Another  solution  is  to  isolate  the  components  of  the  operating  system  interacting  with  user 
processes,  and  provide  them  with  robust  failure  handling  mechanisms.  This  solution  also  seems 
infeasible,  since  incomplete  data  structures  can  be  passed  asynchronously  between  system  compo¬ 
nents,  and  therefore  user  processes  may  share  variables  with  arbitrarily  ‘deep’  operating  system 
components. 

Another  solution,  adopted  by  the  operating  system  designed  by  ICOT,  is  to  use  specialised 
filter  processes  to  monitor  user-system  interaction.  These  processes  forward  back  and  forth  instan¬ 
tiations  done  by  the  interacting  processes,  as  long  as  the  user  processes  obey  the  protocol  which 
the  operating  system  expects.  When  a  violation  by  the  user  is  detected,  the  Alter  does  not  pass  it 
further  to  the  system.  Foster  [48]  describes  three  techniques  for  achieving  robustness  in  operating 
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systems  implemented  in  languages  that  do  not  support  read-only  variables:  at-source  (by  trans¬ 
formation  of  user  programs),  en-route  (by  filters)  or  at-destination  (by  making  system  programs 
fail-safe).  The  second  technique  is  shown  to  be  generally  the  most  effective. 

Read-only  variables  allow  a  simpler  solution  [76],  An  operating  system  component  which 
produces  a  data- structure  incrementally  can  protect  the  incomplete  part  of  the  data  structure 
bom  outside  intervention.  This  is  done  by  making  it  read-only  to  its  readers,  and  keeping  the 
writable  access  to  oneself.  This  is  achieved  by  placing  a  read-only  variable  Xf  in  every  ‘hole’  in 
the  data  structure,  and  keeping  X.  For  example,  a  protected-etream  producer  can  be  defined  as 
follows: 

p([X|Xs?],...)-p(Xs',...). 

If,  when  f(Xt,. . .)  is  invoked,  it  has  the  only  writable  occurrence  of  its  first  argument  X s,  this 
invariant  will  hold  in  all  future  iterations  of  the  process,  and  no  consumer  can  interfere  with  the 
stream  production.  If  the  Mnrsfe  itself  is  also  produced  incrementally,  it  could  also  be  protected 
using  the  same  technique. 

Discussion 

The  advantages  of  read-only  unification  over  matching  is  that  it  is  a  generalisation  of  unification, 
rather  than  a  special  case  of  it:  read-only  unification  in  the  absence  of  read-only  variables  is 
just  unification.  Hence  read-only  unification  achieves  both  communication  and  synchronisation 
with  a  single  notion.  Second,  read-only  unification  is  symmetric:  unlike  matching,  it  does  not 
distinguish  between  the  goal  and  the  clause,  and  the  read-only  unification  of  any  two  terms  behaves 
alike.  Third,  it  is  dynamic.  Read-only  variables  can  be  embedded  in  any  data-structure,  hence 
synchronisation  can  be  associated  with  data,  not  only  with  procedures. 

Some  of  the  disadvantages  of  read-only  unification  come  from  its  strenght:  Not  being  success 
stable  makes  it  harder  to  analyse  statically  FCP(?)  programs,  and  often  makes  FCP(?)  less  read¬ 
able  compared  to  programs  using  input  matching.  Its  non-monotonic  nature  makes  it  more  difficult 
to  analyse  theoretically,  compared  to  languages  which  use  only  input  matching  and  unification. 
Finally,  it  has  some  points  of  singularity  (e  g.  the  unification  of  X  with  Xf),  which  do  not  seem 
to  have  acceptable  intuition  behind  them. 

An  alternative  concept,  called  locks,  was  proposed  by  M.  Miller  and  E.D.  Tribble  and  formal¬ 
ised  by  Saraawat  [158],  Its  motivation  was  to  provide  more  reasonable  semantics  to  the  unification 
X=Xt.  In  FCP(?),  this  unification  subtracts  the  writing  capability  from  X,  making  it  read-only. 
In  the  alternate  proposal,  its  effect  it  to  make  both  X  and  X?  writable.  The  ability  of  a  read-only 
variable  to  become  writable  gives  rise  to  both  additional  complications  and  additional  programming 
techniques,  though  it  has  not  been  pursued  to  completion. 


16.  FCP(:,?)  —  An  Integration  of  FCP(:)  and  FCP(?) 

The  language  FCP(:,?)  [100]  attempts  to  integrate  the  convenience  and  efficiency  of  matching  with 
the  expressiveness  of  atomic  teat  unification  and  read-only  variables.  In  addition,  it  has  the  added 
pragmatic  advantage  over  FCP(?)  of  being  a  superset  of  Flat  GHC,  FCP(|),  and  FCP(:),  in  the 
sense  that  every  program  in  these  languages  would  execute  correctly  as  an  FCP(:,?)  program. 

FCP(:,?)  is  is  strong  as  any  other  language  in  the  family,  in  the  sense  that  there  are  natural 
embeddings  of  all  languages  in  the  family  into  it.  It  is  the  target  language  of  the  implementation 
effort  at  the  Weismann  Institute  [99]. 

Syntax 

The  syntax  of  FCP(:,?)  is  the  same  as  that  of  FCP(:),  except  that  the  tell  and  body  parts  may 
contain  read-only  variables. 

Semantics 
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The  semantics  of  the  language  is  also  the  same  at  FCP(:),  except  that  in  the  tell  part  read-only 
unification  is  uaed  instead  of  ordinary  unification.  This  b  reflected  in  the  try  function  of  FCP(:,?), 
which  b  the  same  as  that  of  FCP(:),  except  that  it  uses  mpa?  instead  of  mgs,  and  returns  snspend 
if  the  read-only  unification  in  the  tell  part  suspends  on  a  read-only  goal  variable,  and  fail  if  it  fails 
or  suspends  on  a  read-only  clause  variable  (since  the  latter  hind  of  suspension  b  stable). 

Programming  in  FCP(:,?) 

As  mentioned  above,  any  FGHC,  FCP(|)  or  FCP(:)  would  execute  correctly  as  an  FCP(:,?)  pro¬ 
gram.  The  FCP(?)  programs  shown  in  the  previous  section  easily  translate  into  FCP(:,?).  For 
example,  the  multiple- writer  stream  is  written  as  follows: 

write(M,Ms,Ms/)  -  true  :  Ms=[X?|Ms/]  |  M=X. 
write(M,[_|Me],Mg/)  —  write(M,Ms,Ms/). 

and  the  protected  stream  producer  as  follows: 

p(Xs,. . .)  «—  true  :  X*=  [Message)  XV?]  (  p(Xs/t. . .). 


17.  Doc  —  UX  =  X  Considered  Harmful” 

The  language  Doc  (Directed  Oc)  by  Hirata  [82],  b  a  successor  to  Oc  [81,83].  Oc  b  essentially 
FGHCn%v  with  no  guards.  Doc  is  a  further  restriction,  which  follows  the  motto  UX  =  X  considered 
harmful” .  Doc  is  a  concurrent  logic  programming  language  in  which  every  variable  has  at  most 
one  writer  and  at  most  one  reader,  i.e.  one  process  which  instantiates  a  variable,  and  one  process 
that  matches  it.  Thb  restriction  b  enforced  syntactically,  by  annotating  each  variable  occurrence 
as  either  a  writable  or  a  read-only,  and  requiring  that  each  variable  may  occur  at  most  once  in 
each  mode  in  a  clause. 

The  motivation  for  thb  restriction  b  that  the  cost  of  broadcasting  information  in  a  distributed 
environment  may  be  too  expensive  to  be  supported  at  the  language  level. 

pjgcuafcn 

Although  the  removal  of  variable-to-variable  unification  from  logic  programming  seems  a  rather 
drastic  proposal,  its  effect  is  not  fatal,  and  the  resulting  language  b  still  usable.  The  techniques 
available  in  Doc  (except  for  protected  data  structures)  are  a  subset  of  those  available  in  FGHCn»* 
In  particular,  the  short-circuit  technique  and  any  of  the  techniques  relying  on  atomic  unification  are 
not  available  in  Doc.  Furthermore,  broadcasting  is  not  available  in  Doc,  and  should  be  implemented 
by  an  explicit  distributor  process,  which  receives  a  message  and  distributes  it  separately  to  each 
recipient.  In  addition,  Doc’s  read-only  annotation  is  a  reminiscent  of  the  read-only  variable,  and 
indeed  it  can  employ  the  protected  data^structures  technique,  actually,  a  Doc  process  mail  protect 
any  incomplete  structure  it  intends  to  produce,  by  the  syntactic  restrictions  of  the  language. 
Because  of  the  ability  to  specify  protected  data-structures,  it  seems  that  Doc  cannot  be  embedded 
in  a  language  that  does  not  contain  the  equivalent  of  read-only  variables. 

An  embedding  of  Doc  in  broadcast-free  FCP(?) 

The  similarity  of  Doc’s  annotations  to  writable  and  read-only  variables  in  FCP(?)  is  apparent. 
Indeed,  it  b  natural  to  consider  a  subset  of  FCP(?),  which  may  be  called27  &readc«st-/reeFCP(?), 
in  which  every  variable  may  occur  at  most  once  read-only  and  at  most  once  writable  in  each  clause. 
Doc  programs  can  be  trivially  translated  into  broadcast-free  FCP(?). 

Thb  translation  b  valid,  in  the  sense  that  every  computation  of  the  resulting  FCP(?)  program 
corresponds  to  a  possible  computation  of  the  source  Doc  program.  However,  the  translation  is  not 
an  embedding  in  the  sense  used  so  far.  Since  the  read-only  unification  used  in  FCP(?)  b  atomic, 
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some  executions  of  a  Doc  program  cannot  be  realised  by  the  corresponding  FCP(?)  program. 
This  can  be  remedied  by  further  “decoupling”  variables  in  the  clause,  as  done  in  the  embedding 
of  FGHCaav  in  FGHC*V  in  Section  10,  which  masks  the  atomicity  of  unification  of  FCP(?).  For 
each  variable  X  that  occurs  both  writable  and  read-only  in  a  clause,  replace  X?  by  a  new  variable 
Y? ,  and  add  the  goal  $end(X?,Y)  to  the  body  of  the  clause,  send  is  defined  as  follows.  For  every 
function  symbol  f/n  in  the  program,  n  >  0,  send  has  the  clause: 

send(f(Xi  ,X2|. .  .,Xn),f(Yi?,Y2?,. .  .,Yn?))  - 

send(Xi?,Yi),  send(X2?,Y2),. . send(Xn?,Yn). 

We  note  that  broadcast-free  FCP(?)  is  still  stronger  than  Doc,  since  it  provides  a  variant  of  the 
short  circuit  technique.  In  this  variant  a  ground  message  is  sent  around  the  circuit  in  a  particular 
direction.  Its  arrival  at  the  other  end  indicates  termination. 


18.  Non-Flat  Concurrent  Logic  Programming  Languages:  PARLOG,  GHC,  Con¬ 
current  Prolog,  and  CP(l,|) 

A  concurrent  logic  programming  language  is  non-flat  if  the  guard  of  a  clause  may  contain  program 
defined  predicates.  Several  of  the  flat  languages  described  above  —  Flat  GHC,  Flat  PARLOG, 
FCP(|,1),  and  FCP(?)  —  were  actually  derived  from  their  non-flat  ancestors  simply  by  restricting 
the  guard  to  contain  predefined  predicates  only. 

The  ability  to  define  guard  predicates  implies  that  guard  computations  may  be  unbounded 
and,  in  general,  may  fail  to  terminate.  Nevertheless,  as  in  flat  languages,  &  clause  try  is  an  atomic 
operation:  it  succeeds,  suspends,  or  fails,  and  if  it  suspends  or  fails  it  leaves  no  trace  of  its  attempt. 

Two  approaches  were  taken  to  ensure  atomicity  of  a  clause  try;  they  are  also  reflected  in  the 
corresponding  flat  languages.  One  approach  is  to  forbid  guard  computations  from  assigning  goal 
variables.  This  way  several  clauses  can  be  tried  in  parallel  for  the  same  goal  without  interference. 
This  approach  is  taken  by  PARLOG  and  GHC,  and  is  reflected  in  their  flat  subsets  in  the  restric¬ 
tion  that  guards  can  only  do  matching,  not  unification.  The  second  approach  is  to  allow  guard 
computations  to  assign  goal  variables,  but  to  make  such  assignments  visible  only  upon  commit¬ 
ment.  This  is  reflected  in  the  FCP  languages,  which  allow  test  unification  in  guards,  but  require 
the  unification  attempt  to  be  atomic. 

We  discuss  the  non-flat  languages  informally.  Transition  systems  for  non-flat  languages  are 
given  by  Saraswat  [154]  and  Levy  [114]. 

18.1  PARLOG  and  GHC 

PARLOG  and  GHC  are  similar  in  their  requirement  that  guard  computations  do  not  instantiate 
goal  variables,  but  differ  in  the  way  they  realize  this  requirement.  In  PARLOG,  a  syntactic  compile¬ 
time  check,  called  a  safety  check  is  performed  to  ensure  that  the  program  has  no  computations 
in  which  guards  instantiate  goal  variables  [23].  Since  the  question  whether  a  program  is  safe  is 
undecidable  in  general  [29],  any  algorithm  for  determining  safety  can  only  perform  an  approximate 
check,  and  if  it  correctly  rejects  all  unsafe  programs  then  it  is  bound  to  reject  tome  safe  programs 
as  well.  This  leads  to  the  awkward  situation  in  which  the  set  of  legal  PARLOG  programs  is  either 
undecidable,  or  is  determined  by  an  algorithm,  whose  specification  may  be  both  quite  complex  and 
evolving.  The  practice  of  PARLOG  programming  seems  to  be  that  the  safety  check  is  not  done, 
and  the  responsibility  of  producing  safe  programs  is  placed  on  the  programmer's  intuition. 

The  design  of  GHC  [198]  was  influenced  by  an  earlier  design  of  PARLOG  [24],  called  PAR- 
LOG83  in  [145],  which  employed  output  assignment  instead  of  unification,  and  by  critical  exami¬ 
nation  of  Concurrent  Prolog  [196).  Rather  than  ruling  out  the  possibility  of  the  guard  instantiating 
goal  variables  by  a  syntactic  check,  GHC  ensures  this  with  its  synchronization  rule.  In  fact,  the 
sole  synchronization  rule  of  GHC  states  that  a  unification  in  the  head  or  the  guard  that  attempts 
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to  instantiate  a  variable  in  the  goal  suspends. 

The  implementation  of  this  synchronization  rule  in  full  GHC  requires  recording  for  each  vari¬ 
able  which  level  in  the  process  tree  it  ‘belongs’  to,  which  imposes  considerable  complications  in  the 
runtime  data-etruetures  and  algorithms  [114,185].  Therefore  two  subsets  of  GHC  were  identified: 
one  is  the  flat  subset,  introduced  in  Section  10,  another  is  the  safe  subset,  defined  as  follows.  A 
GHC  program  is  ss/e  if  it  has  no  execution  in  which  a  body  unification  suspends.  Note  that  a  Flat 
GHC  program  is  trivially  safe.  Of  course  whether  a  GHC  program  is  safe  is  also  undecidable. 

As  in  their  fiat  subsets,  the  main  difference  between  Safe  GHC  and  PARLOG  is  the  availability 
of  sequential-And  and  sequential-Or  in  the  latter. 

Although  PARLOG  and  GHC  predate  their  flat  subsets,  there  are  almost  no  examples  which 
show  that  the  former  languages  are  significantly  more  expressive  than  the  latter  ones.  Perhaps  the 
one  interesting  example  is  that  of  unbounded  nondeterministic  x,  implemented  by  recursion 
in  the  guard.  Consider  a  process  c(/s,. . .)  which  has  an  unbouno  .1st  (or  stream)  of  streams  In. 
On  each  iteration,  c  wishes  to  extract  one  element  from  one  of  the  streams,  if  such  an  element  is 
ready,  and  iterate  with  In',  which  contains  the  tail  of  that  stream  and  the  unmodified  remaining 
streams.  If  all  the  streams  close,  the  process  terminates.  Using  non-flat  guards,  the  program  can 
be  written  in  GHC  as  follows: 

e(In,.  .)  — 

get(X,In,In')  | 

...  do  something  with  X  . . . 
c(In',. . .). 

c(In,. . .)  «- 

halt(In)  |  true. 

get(X,[[X'|Xs]|In],InO  -  In'=[Xs|In],  X=X’ 

get(X,[Xs|In],In')  -  get(X',ln,ln")  |  X=X',  In'=[Xs|In"]. 

halt([[  ]|In])  «—  halt(In). 

halt«]). 

The  intermediate  variables  X'  and  In"  are  needed  to  ensure  that  the  recursive  call  of  yet  does  not 
suspend  because  of  an  attempt  to  instantiate  the  goal  variables  X  or  In1. 

Note  the  difference  between  yet  and  kali.  Both  are  recursive,  but  halt  iterates  in  the  body, 
since  it  testa  for  a  conjunctive  condition  (all  streams  hsve  terminated),  whereas  pel  iterates  in  the 
guard,  since  it  testa  for  a  disjunctive  one  (there  is  an  element  on  one  of  the  streams). 

The  program  cannot  be  specified  directly  in  a  flat  language,  since  it  requires  nondelerminiam 
of  unbounded  degree  in  process  reduction.  However,  its  purpose  can  usually  be  achieved  using  a 
merge  network,  which  is  specifiable  in  any  flat  language. 

Embedding  Safe  GHC  in  FCP(:) 

Safe  GHC  can  be  embedded  in  FCP(:)  using  a  technique  for  compiling  Or-parallelism  into  And 
parallelism,  developed  by  Codish  and  Shapiro  [29].  The  idea  is  to  spawn  And-parallel  processes  to 
evaluate  Or-parallel  guards,  and  thread  these  processes  using  two  short-circuits:  a  success  circuit, 
which  reports  the  success  of  one  of  the  guards,  and  a  failure  circuit,  which  reports  the  failure  of  all 
guards.  The  hierarchical  And/Or  tree  is  implemented  by  s  hierarchy  of  success  and  failure  circuits. 
The  power  of  FCP(:)  is  needed  since  the  method  requires  reflection  on  the  failure  of  unification. 

A  mutual  exclusion  protocol  ensures  that  at  most  one  guard  can  commit  for  each  goal.  Al¬ 
though  the  mutual  exclusion  protocol  used  in  the  original  embedding  [29]  relies  on  atomic  unifi¬ 
cation  (Section  14),  the  less  efficient  single-round  mutual  exclusion  protocol  (Section  7.4)  can  be 
used  as  well.  The  technique  was  later  enhanced  by  Levy  and  Shapiro  [116],  into  a  compiler  from 
Safe  GHC  to  FCP(?). 

The  technique  cannot  be  used  to  embed  (unsafe)  GHC  in  a  flat  language,  since  a  correct 
implementation  of  GHC  requires  recording  the  guard  in  which  a  variable  is  allocated.  This  problem 
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is  further  discussion  in  Section  20. 

Embedding  PARLOG  in  FCP(:) 

The  technique  for  compiling  Or-parallelism  into  And-psrsllelism  can  be  combined  frith  the  FCP(:) 
implementation  of  the  control  meta-call  to  form  an  embedding  of  Safe  GHC  -l-  the  control  meta-call 
in  FCP(:).  It  can  be  further  combined  with  the  techniques  for  embedding  FP(fc)  and  FP(;)  in 
FGHC.„  to  embed  PARLOG  in  FCP(:). 

18.2  Concurrent  Prolog  and  CP(1,|) 

Concurrent  Prolog  [160]  is  the  ancestor  of  FCP(?).  Similarly,  the  language  CP(|,i)  [154,157,151]  is 
the  ancestor  of  FCP{:).  Unlike  GHC  and  PARLOG,  both  allow  guard  computations  to  instantiate 
goal  variables.  However,  to  achieve  atomicity  of  a  clause  try,  these  instantiations  should  not  be 
visible  outside  the  calling  goal  prior  to  the  commitment  of  the  clause.  In  order  to  perform  Or- 
parallel  clause  evaluation  in  Concurrent  Prolog,  a  ‘multiple-environments’  mechanism  is  necessary. 
This  mechanism  allows  competing  clauses  to  make  temporary  and  hidden  instantiations  to  goal 
variables,  which  become  permanent  and  visible  only  upon  commitment.  Several  approaches  to 
the  construction  of  such  a  mechanism  were  investigated  [114],  but  none  have  lead  to  satisfactory 
results.  The  difficulty  in  constructing  such  a  mechanism  can  be  understood  by  examining  the 
power  of  Concurrent  Prolog.  It  can  specify  almost  trivially  an  Or-Parallel  Prolog  interpreter, 
which  simulates  the  don’t-know  nondeterminism  of  Prolog  by  recursion  in  guards. 

An  embedding  of  Or-parallel  Prolog  in  Concurrent  Prolog 

The  Or-parallel  Prolog  interpreter  assumes  that  the  Prolog  program  is  represented  by  the  Con¬ 
current  Prolog  procedure,  cfsases/2,  which  returns  on  the  call  clatue»(A,Ct)  the  list  of  clauses 
C*  potentially  unifiable  with  the  goal  A.  In  principle  Cs  can  be  the  entire  Prolog  program,  but 
indexing  on  procedure  names  or  even  on  goal  arguments  can  be  used  to  reduce  the  number  of 
clauses  returned.  Each  Prolog  clause  A  —  Bi,..  .,B*  is  translated  into  a  term  in  the  list  C*  of  the 
form  (A«-[Bi,. .  .fl*|I?s]\B«).  Note  that  it  represents  the  (possibly  empty)  body  by  a  (possibly 
empty)  diflerence-list  of  goals.  Given  this  translation,  an  Or-Parallel  Prolog  interpreter  can  be 
written  in  Concurrent  Prolog  as  follows^ 

soIve([  ]). 
solve([A|Aa])  •— 

clauses(A.Cs),  reeolve(A,Ce?,As?). 

resolve(A,[(A«-Bs\Aa)|Cs],As)  •— 
solve(Bs)  |  true. 
reaolve(A,[_|Cs],As)  <— 

resolve(A,Cs,As)  |  true. 

The  interpreter  as  defined  can  return  only  one  answer  to  a  goal.  This  limitation,  however,  is 
shared  also  by  Prolog  meta-interpreters.  To  collect  all  solutions  to  a  goal,  a  set  abstraction  is 
incorporated  in  Prolog.  It  is  typically  implemented  by  storing  the  solution  found  (using  a  side- 
effect)  and  inducing  failure.  The  approaches  of  Ueda  [200,201]  and  Shapiro  [165],  in  comparison, 
naturally  collect  all  solutions  to  a  goal. 

The  simplicity  of  this  interpreter  indicates  that  the  implementation  of  the  multiple- 
environments  mechanism  of  Concurrent  Prolog  is  at  least  as  difficult  as  the  direct  implementation 
of  Or-Parallel  Prolog.  Presently  it  seems  that  the  added  complexity  of  Concurrent  Prolog  over  its 
flat  subset  outweighs  its  added  expressiveness. 


This  interpreter  is  due  to  Kenneth  M.  Kahn. 
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PART  IV.  IMPLEMENTATIONS  AND  APPLICATIONS 

19.  Implementations  of  Concurrent  Logic  Programming  Languages 

Considerable  effort  has  been  invested  in  efficient  implementations  of  concurrent  logic  programming 
languages,  for  both  sequential  and  parallel  computers. 

19.1  Sequential  implementations 

We  consider  in  depth  implementation  techniques  for  fiat  languages,  then  mention  briefly  techniques 
for  non-flat  languages. 

There  are  several  implementations  of  flat  languages  [f  ’,49,89].  All  employ  some  variant  of 
an  abstract  machine  developed  by  a  group  at  the  Weismann  Institute,  first  incorporated  in  an 
interpreter  for  FCP  [128],  and  later  refined  and  integrated  with  techniques  for  compiling  unification, 
developed  by  Warren  [206],  within  a  compiler/emulator  based  implementation  [89]. 

The  sequential  abstract  machine 

The  key  concepts  of  the  machine  are  as  follows.  The  machine  represents  the  goal  by  an  active 
queue  and  a  set  of  suspension  lists.  Each  process  in  the  goal  is  either  in  the  active  queue  or  in  one 
or  more  suspension  lists.  Each  suspension  list  is  associated  with  an  unbound  variable,  and  may 
consist  of  several  processes. 

Tbe  basic  operation  of  the  machine  is  to  dequeue  a  process  from  the  active  queue,  and  try  to 
reduce  it  with  some  clause  in  the  program.  This  operation  is  called  a  process  trj.  A  process  try  is 
composed  of  a  sequence  of  clause  tries.  In  each  clause  try  the  try  function  of  the  process  and  the 
clause  is  computed  (see  Section  4.2).  A  process  try  succeeds  if  one  of  its  clause  tries  succeeds;  it 
suspends  if  none  succeeds,  but  at  least  one  suspends;  it  fails  if  all  clause  tries  fail.  When  a  process 
try  succeeds  the  try  subatitution  9  is  computed.  When  a  process  try  suspends  a  set  of  suspension 
variables  is  computed;  a  variable  is  included  in  the  set  if  its  being  instantiated  in  the  future  may 
release  some  clause  try  from  suspension,  i.e.  cause  it  to  succeed  or  fail. 

If  a  process  try  succeeds  with  a  substitution  $  then  the  goals  in  the  body  of  the  successful 
clause  are  added  to  the  active  queue,  and  $  is  applied  to  the  state  of  the  computation.  In  addition, 
processes  in  suspension  lists  of  variables  in  the  domain  of  0  are  moved  to  the  active  queue.  If  the 
process  try  suspends  with  a  suspension  set  5  then  the  process  is  added  to  the  suspension  lists  of 
each  of  the  variables  in  S.  If  the  process  try  fails  the  machine  halts  with  an  error  state. 

Note  that  a  process  can  suspend  on  several  variables,  and  be  activated  and  suspended  several 
times  before  succeeding  or  failing.  A  mutual  exclusion  mechanism,  described  below,  ensures  that 
a  process  is  activated  at  most  once  per  suspended  process  try 

The  machine  is  connected  to  one  or  more  external  input  devices,  realised  by  data  streams, 
including  a  keyboard,  and  typically  has  a  process  consuming  each  stream.  The  machine  terminates 
successfully  when  all  external  input  streams  are  closed,  and  there  are  no  processes  left.  It  terminates 
with  deadlock  if  all  input  streams  are  closed  and  only  suspended  processes  are  left. 

The  machine  maintains  all  dynamic  data  structures  in  a  single  address  space,  called  a  Aesp. 
The  heap  grows  when  terms  are  allocated  and  processes  are  created,  and  shrinks  by  garbage- 
collection.  The  structures'  in  the  heap  are  variables,  terms,  process  records,  suspension  records, 
activation  records,  and  programs. 

A  variable  is  represented  by  one  memory  word,  which  is  eiiuer  empty  or  points  to  a  suspension 
list.  When  a  variable  is  instantiated  to  a  term,  its  memory  word  becomes  a  reference  (pointer)  to 
the  term  unless  the  term  can  be  stored  in  one  word  (e.g.  an  integer),  in  which  case  it  is  stored  in 
place  of  the  variable.  Other  terms  are  represented  using  standard  techniques.  A  pr*w*e*  with  a 
predicate  p/%  is  represented  by  n-f  t  words:  one  for  the  program  counter,  which  points  at  the  code 
of  the  procedure  p/nt  n  words  for  the  process  arguments,  and  one  word  for  chaining  the  process  in 
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record 


Figure  7:  Suspending  a  process  on  two  variables 


the  active  queue.  The  active  queue  consists  of  chained  processes.  A  suspension  list  consists  of  a  list 
of  suspension  records  (which  could  be  list  cells).  Each  suspension  record  points  to  an  activation 
record  and  to  the  next  suspension  record  if  there  is  one.  The  activation  record  realises  the  mutual 
exclusion  mechanism  which  prevents  multiple  activations  of  the  same  process.  It  either  points  to  a 
process  record  or  is  null,  if  the  process  has  already  been  activated.  If  a  process  suspends  on  several 
variables,  the  suspension  records  in  the  suspension  lists  of  these  variables  all  point  to  the  same 
process  activation  record,  which  in  turn  points  to  the  process.  The  first  variable  to  be  assigned 
activates  the  process  by  enqueing  it  to  the  active  queue,  and  sets  its  activation  record  to  null.  This 
prevents  the  other  variables  from  re- activating  this  process.  A  process  suspended  on  two  variables 
is  shown  in  Figure  7. 

In  addition  to  the  heap,  the  machine  has  global  registers  for  the  active  queue  front  and 
back,  top-of-heap  pointer,  current  process,  current  program  counter,  etc.  In  a  language  with  test 
unification,  the  machine  also  has  a  trail.  The  trail  is  used  to  record  assignments  made  during  test 
unification  in  a  clause  try,  so  that  they  can  be  undone  if  the  test  unification  subsequently  suspends 
or  fails.  Unlike  the  standard  Prolog  trail,  which  needs  to  support  deep  backtracking,  the  trail  in 
fiat  languages  needs  to  support  only  shallow  backtracking,  and  is  reset  on  every  clause  try.  As  a 
result  it  can  be  rather  small  (e  g.  256  words). 

The  machine  employs  several  optimisations,  the  most  important  being  tail-recursion 
optimisation39.  Each  dequeued  process  is  given  a  time-slice  t  (e.g.  t  =  25).  When  a  process 
A  with  time-slice  f  is  reduced  to  the  processes  B\,. .  -,Bk ,  k  >  1,  then  one  of  them,  say  B\t  reuses 
A’s  process  record  (if  it  is  large  enough),  inherits  the  time-slice  t-1 ,  and  is  immediately  tried  if  t 
>  1.  For  the  other  processes  B a,. .  .,Bk  new  process  records  are  allocated,  and  they  are  enqueued 
to  the  back  of  the  active  queue.  If  I  =  1  then  B\  is  also  enqueued.  This  scheme  maintains  And- 
fairness  while  decreasing  process  switch  and  memory  access  (assuming  some  process  arguments  are 
maintained  in  processor  registers  during  a  time-slice). 

To  increase  the  chance  of  a  process  record  being  reused,  minimal  sise  records  are  allocated 


39  This  bum  m  kept  for  historical  reason*.  The  optimisation  applies  to  any  clause,  not  necessarily  recursive,  and 
not  necessarily  to  the  tail  call. 
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(e.g.  10  word*).  In  addition,  free-list*  of  procem  records,  suspension  records,  and  activation  records 
are  maintained  between  garbage  collections,  to  improve  storage  utilisation. 

Implementations  of  non-flat  languages 

One  way  to  achieve  atomicity  of  a  clause  try  in  a  non-flat  language  is  to  try  and  reduce  goals  in 
some  order;  when  reducing  a  goal,  try  each  clause  in  some  order;  and  for  each  clause  guard  apply 
this  execution  algorithm  recursively.  This  is  the  algorithm  incorporated  in  the  first  interpreter  for 
Concurrent  Prolog,  written  in  Prolog  [ISO}.  Variants  of  it  were  implemented  on  top  of  Prolog,  both 
for  Concurrent  Prolog  and  for  GHC  and  CP(),i,&)  [203,  153}.  This  execution  algorithm,  however, 
does  not  satisfy  any  fairness  requirements.  For  example,  an  attempt  to  reduce  a  faulty  process 
(with  a  nonterminating  guard)  may  block  the  rest  of  the  system  forever. 

Several  other  executions  algorithms  for  Concurrent  Prolog  which  do  not  suffer  from  this  prob¬ 
lem  were  investigated  [115,127,141].  Their  complexity,  however,  seemed  unacceptable,  and  was 
partially  a  motivation  for  the  development  of  Flat  Concurrent  Prolog  and  the  simpler  non-flat 
languages,  GHC,  PARLOG,  and  CP(|,i).  An  abstract  ms come  for  PARLOG  was  developed  by 
Gregory  et  a I.  [68],  and  later  optimised  by  Crammond  [34].  Its  baric  design  differs  from  the  FCP 
abstract  machine  [89]  in  that  it  explicitly  maintains  a  process  tree.  Another  abstract  machine  for 
GHC  was  derived  from  the  FCP  machine  by  Levy  [113].  Although  GHC  is  simpler  than  Concur¬ 
rent  Prolog,  its  implementation  still  required  fairly  heavy  machinery.  Therefore  Safe  GHC  was 
investigated,  and  a  compiler  from  Safe  GHC  to  FCP  was  developed  [116]. 

Compilation  of  unification 

The  basic  data  manipulation  of  logic  languages  is  unification.  Warren  [206]  has  developed  a  method 
for  compiling  unification  efficiently  by  identifying  ita  various  special  cases  which  are  specified  in  a 
clause  head,  and  generating  special  instructions  for  them. 

Warren’s  scheme  was  designed  for  Prolog’s  general  unification,  and  is  applicable  both  to  FCP’s 
read-only  unification  [89],  and  to  the  input  matching  employed  by  FGHC  [97]  and  PARLOG. 
Using  it,  an  abstract  machine  along  the  lines  described  above  can  achieve  the  same  uniprocessor 
performance  as  the  Warren  abstract  machine  for  Prolog. 

However,  for  input  matching  one  can  do  better  than  Warren’s  scheme.  The  input  matching 
component  of  a  set  of  clauses  of  the  same  procedure  can  be  jointly  compiled  into  a  decision  tree, 
which  combines  shared  matclmgs  and  finds  more  efficiently  the  set  of  applicable  clauses  [99]. 

Pmewof  architectures 

Two  processor  architectures  specialised  for  the  execution  of  s  concurrent  logic  programming  lan¬ 
guage,  namely  FCP(?),  were  developed.  The  first  architecture,  Carmel  [74,75],  takes  the  RISC 
approach.  It  augments  a  simple  processor  architecture  with  mechanisms  to  support  the  expensive 
or  frequent  operations  of  FCP(?).  By  carefully  tuning  the  instruction  set  and  processor  architec¬ 
ture,  impressive  performance  is  obtained. 

The  second  architecture,  by  Alkalaj  and  Shapiro  [5],  takes  the  view  that  internal  concurrency 
in  a  processor  combined  with  a  carefully  designed  memory  hierarchy  is  the  key  to  high  perfor¬ 
mance.  The  architecture  consists  of  several  specialised  processing  units,  each  with  its  own  memory 
hierarchy.  The  reduction  and  tag  processors  are  at  the  root  of  the  hierarchy.  They  are  supported 
by  three  additional  processing  units:  an  instruction  processor,  a  data-trail  processor,  and  a  goal- 
management  processor.  The  instruction  processor  employs  standard  techniques  for  instruction 
prefetching  and  caching.  The  data-trail  processor  employs  a  data  cache  enhanced  to  support  shal¬ 
low  backtracking,  required  in  the  implementation  of  atomic  test  unification.  The  goal-management 
processor  manages  the  top  of  the  process  queue  in  a  way  analogous  to  how  s  RISC  processor  man¬ 
ages  the  top  of  the  activation  stack.  The  goal-management  processor  manages  process  switching, 
spawning,  activation,  and  suspension,  using  s  bank  of  register  windows.  The  execution  algorithms 
of  this  architecture  are  specified  using  an  FCP(?)  program,  by  hardware  description  techniques 
developed  by  Susuki  [172]  and  Weinbsum  and  Shapiro  [208].  The  specification  forms  s  working 
simulator  of  the  architecture.  The  performance  of  this  architecture  is  yet  to  be  evaluated.  How 


time  two  processors  can  be  integrated  in  a  multiprocessor  architecture  ia  an  open  question. 

The  PSI-II  processor  was  designed  for  the  execution  of  Prolog,  but  was  re-microcoded  to 
irrv '  -uent  KL1  [195].  It  is  the  building  block  of  the  multi- PSI  parallel  machine. 

19.2  Parallel  implementatknw 

We  review  the  concept*  behind  two  types  of  parallel  implementations:  distributed  and  shared- 
memory.  Hu  implementations  include  a  distributed  implementation  of  FCP  [190],  a  distributed 
implementation  at  FGHC  [90],  a  distributed  implementation  of  Flat  PARLOG  [47],  and  shared- 
memory  implementation  of  PARLOG  [34]. 

The  core  operation  in  these  implementations  is  unification. 

Distributed  atomic  unification 

In  a  distributed  implementation  each  processor  executes  a  variant  of  the  sequential  abstract  ma¬ 
chine,  described  above,  and  takes  special  actions  when  a  clause  try  involves  variables  shared  with 
other  processors.  These  actions  realise  a  distributed  unification  algorithm. 

Since  non-variable  terms  are  immutable  data  structures,  they  can  be  replicated  upon  demand 
throughout  a  processor  network  without  any  special  consistency  maintenance  mechanisms.  The 
writing  on  a  variable,  however,  needs  to  be  coordinated.  In  particular,  in  a  language  with  atomic 
unification,  a  unification  that  involves  writing  on  several  variables  should  either  succeed  in  writing 
on  all  of  them,  or  write  on  none.  Hence,  from  a  distributed  implementation  point  of  view,  an 
atomic  unification  is  best  viewed  as  an  atomic  transaction,  which  may  read  bom  and  write  to 
several  logical  variables.  Standard  database  concurrency-control  techniques  for  realising  atomic 
transactions  can  be  adapted  to  the  particular  requirements  of  unification. 

One  approach,  applicable  to  a  network  of  processors  without  shared  memory,  is  as  follows. 
It  USES  the  messages  read,  lock,  becamejeoUt,  and  become Jocol ■  A  variable  shared  by  several 
processors  is  represented  by  a  directed  tree,  with  edges  pointing  towards  the  root.  Each  processor 
sharing  a  variable  stores  a  node  of  the  tree  in  its  local  memory,  which  contains  the  address  of  the 
node  it  is  pointing  to  if  it  is  not  the  root.  An  occurrence  of  a  variable  ia  called  remote  if  it  is  an 
internal  node  in  the  tree;  local  if  it  ia  the  root  of  the  tree. 

An  attempt  to  read  a  shared  remote  variable  ia  called  a  read-fault  A  processor  executing  a 
process  which  has  had  a  read-fault  sends  a  read  request  up  the  tree,  and  adds  the  faulting  process 
to  the  remote  variable’s  suspension  list.  When  a  processor  storing  the  root  of  the  tree  receives  a 
read  message,  it  operates  as  follows.  If  the  variable  has  been  assigned  a  term  T,  a  bccomcjtoluc{  T) 
message  is  sent  in  reply.  If  the  variable  is  still  unbound,  the  read  request  is  stored  in  the  variable’s 
suspension  list,  and  will  be  replied  to  when  the  variable  is  assigned. 

A  shared  variable  can  be  written  only  at  its  root.  Write-permission  is  transferred  between 
processors  by  changing  and  redirecting  edges  in  the  tree.  A  processor  with  a  local  shared  variable 
(i.e.  the  root  of  a  shared  variable  tree)  may  write  on  it  when  it  pleases.  It  ensures  that  a  unification 
that  involves  writing  on  several  shared  variables  is  atomic  by  not  responding  to  messages,  including 
read  messages,  while  performing  a  clause  try. 

An  attempt  to  write  on  a  remote  shared  variable  is  called  a  vrilc-fo eft.  A  processor  execu.ing 
a  process  which  has  had  a  write-fault  sends  a  lock  message  up  the  variable’s  tree,  and  suspends 
the  faulting  process  on  the  remote  variable.  The  processor  receiving  this  message  replies  with  a 
becomejooUe(T)  if  the  variable  has  already  been  assigned  s  term  T,  or  with  s  become  Jocol(  Reads) 
if  the  variable  is  still  unbound,  and  changes  its  local  variable  to  be  a  remote  variable  pointing  at 
the  sender’s  variable.  Reeds  is  the  (possibly  empty)  list  of  suspended  read  requests  on  the  sender’s 
local  variable  suspension  queue,  to  which  s  request  from  the  sender's  own  variable  is  added  in  case 
it  has  local  processes  suspended  on  it.  The  receiver  of  a  tecomejocsf(  Reads)  message  changes  its 
variable  from  remote  to  local,  wakes  up  all  processes  suspended  on  it,  end  adds  the  Reads  to  the 
variable’s  suspension  list. 

The  scheme  as  described  may  result  in  livelock,  if  two  processors  keep  sending  lock  requests  to 
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each  other,  and  none  accumulate)  enough  local  variable)  to  perform  a  proems  reduction.  To  prevent 
this,  a  2-phase-locking  scheme  can  be  incorporated  [190,192].  The  scheme  requires  additional 
bookkeeping  by  a  write-faulting  processor,  but  not  additional  messages.  We  do  not  describe  its 
details  here. 

Another  question  to  address  is  how  to  handle  variable  to  variable  unifications.  One  approach 
is  to  lock  (i.e.  make  local)  the  two  variables  when  assigning  one  to  the  other.  This  ensures  that 
no  cycles  are  created,  but  may  cause  superfluous  contention  in  applications  using  the  short-circuit 
technique.  A  second  approach  is  to  impose  some  ordering  on  variables,  and  to  respect  this  ordering 
when  unifying  two  variables.  Another  approach  is  not  to  prevent  the  creation  of  cycles,  but  to 
break  them  when  they  are  detected. 

Implementation  of  non-atomic  unification  and  the  meta-call  construct 

In  languages  without  atomic  unification,  such  as  GHC,  PARLOG,  and  their  flat  subsets,  simpler 
algorithms  than  the  one  described  above  apply.  For  example,  when  unifying  a  remote  variable  X 
with  a  term  T  it  is  not  necessary  to  bring  X  locally  before  assigning  it;  instead,  a  message  can  be 
sent  to  X ,  requesting  it  to  unify  with  T.  If  the  unification  fails,  the  machine  halts  with  an  error 
state  (or  simply  notes  the  inconsistency  and  proceeds). 

Since  either  of  these  behaviors  is  not  acceptable  in  a  multi-tasking  operating  system,  the 
meta-call  construct,  described  in  Section  10.3,  was  developed.  The  implementation  of  the  meta¬ 
call  construct  must  be  integrated  with  the  distributed  unification  algorithm  in  order  to  detect 
termination  and  to  correctly  ascribe  failure.  One  approach,  taken  in  the  distributed  implementation 
of  FGHC  [90],  is  to  associate  with  every  computation  (invocation  of  a  meta-call)  a  unique  identifier, 
and  maintain  tables  associating  computation  identifiers  with  the  appropriate  streams  of  the  meta¬ 
call.  When  a  unification  fails,  this  fact  is  reported  to  the  computation  by  placing  a  message  on  tbe 
appropriate  stream.  Since  the  short-circuit  technique  is  not  applicable,  distributed  termination  of 
a  computation  is  detected  by  maintaining  an  explicit  distributed  counter  for  each  computation,  at 
the  language  implementation  level. 

Foster  [47]  describes  an  alternative  approach  to  the  distributed  implementation  of  the  control 
metacali,  which  avoids  the  complexity  of  FGHC’s  distributed  counters.  Only  uniprocessor  compu¬ 
tations  are  supported  directly  in  the  implementations  and  remote  structure-to-structure  unification 
operations  are  performed  locally.  Acknowledgement  messages  and  message  counting  on  individual 
nodes  hence  suffice  for  termination  detection.  Termination  detection  in  distributed  (multi-node) 
tasks  is  programmed  in  PARLOG  using  the  usual  techniques 

A  shared-memory  implementation 

Crammond  [34]  describes  a  parallel  implementation  of  PARLOG  on  a  shared-memory  multiproces¬ 
sor.  In  this  implementation  each  processor  hss  its  own  data  areas,  although  processors  may  access 
each  other’s  areas  in  order  to  read  the  value  of  a  shared  variable,  to  assign  a  shared  variable,  or  to 
take  work  (processes)  from  each  other.  A  simple  locking  mechanism  is  employed,  where  a  proces¬ 
sor  that  modifies  a  shared  object  (e.g.  a  process  queue  or  a  shared  logical  variable)  locks  it,  and 
a  processor  attempting  to  lock  a  locked  object  busy  waits  (“spins”)  until  this  object  is  unlocked. 
Since  PARLOG  does  not  have  atomic  unification,  a  processor  needs  at  most  one  lock  at  a  time, 
and  hence  this  locking  scheme  does  not  result  in  deadlock.  An  extension  of  this  implementation 
scheme  to  languages  with'  atomic  unification  would  require  some  concurrency  control  mechanism 
similar  to  tbe  one  discussed  in  this  section  above  for  distributed  atomic  unification. 

A  simple  load  balancing  scheme  is  employed  in  this  implementation,  where  a  processor  de¬ 
queues  processes  from  its  own  queue  as  long  as  it  is  not  empty,  and  dequeues  from  some  other 
processor  with  a  nonempty  queue  if  its  own  queue  is  empty.  Using  such  s  scheme,  this  implemen¬ 
tation  obtained  a  speedup  of  up  to  1$  using  20  processors.  Alternative  load  balancing  schemes  can 
be  incorporated  in  thia  implementation  with  little  difficulty. 

An  analysis  of  a  shared-memory  implementation  of  Flat  GHC  is  reported  by  Tick  [193]. 
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19.3  Process  to  processor  mapping 

The  qoeation  of  how  to  map  ptocemes  to  processors  U  not  unique  to  concurrent  logic  programming, 
and  any  general  approach  or  aolution  may  be  applicable.  Approaches  to  the  problem  fall  into  two 
general  categories:  methods  in  which  the  program  itself  (or  programs  associated  with  it)  specify 
the  mapping,  and  dynamic  mapping  techniques,  incorporating  load-balancing  algorithms.  Hybrid 
techniques  are  also  possible. 

We  show  how  instances  of  the  two  approaches  can  be  realised  using  distributed  meta¬ 
interpreters.  The  interpreters  are  shown  in  FCP(:),  although  they  could  be  written  in  any  flat 
language. 

Mapping  with  Turtle-programs 

The  use  of  Turtle-programs  for  mapping  processes  to  processors  was  suggested  and  demonstrated 
in  [163].  Assume  that  the  parallel  machine  is  a  (finite  or  infinite)  two  dimentional  grid.  View  each 
process  as  a  LOGO- like  Turtle,  which  has  a  position  on  the  grid,  and  a  heading.  With  each  process 
activation  (body  god)  P  we  can  association  a  LOGO-like  Turtle  program  TP,  as  in  POTP.  The 
meaning  of  the  call  POTP  is  that  P  should  have  the  position  and  heading  obtained  by  applying 
TP  to  the  position  and  heading  of  its  parent  process,  and  execute  in  the  processor  corresponding  to 
that  position.  Processes  without  an  associated  Turtle  program  simply  inherit  their  paren’t  position 
and  heading. 

Using  this  notation,  a  sequence  of  processes  can  be  easily  mapped  on  a  sequence  of  processors. 
For  example,  consider  the  im  vectro- matrix  multiplication  program  in  Section  7.2.  Adding  the 
Oforward  Turtle  program  to  the  recursive  call  to  em,  cause  the  inner-product  processes  ip  to  be 
placed  on  a  sequence  of  adjacent  processes: 

96  vm(Xv,  Ym,Zv)  ♦-  multiplying  the  vector  Xv  by  the  matrix  Ym  gives  the  vector  Zv. 

vm(_,[],Zv)  —  Zv=[). 
vm(Xv,[Vv|Ym],Zv)  -  Zvs[Z|Zv'], 

ip(Xv,Yv,Z),  vm(Xv,Ym,Zv/)Oforw»rd. 

Mapping  proccM  arrays  to  processor  arrays  is  just  as  easy.  Consider  the  matrix  multiplication 
program  mm  in  Section  7.2.  Adding  the  0 forward  Turtle  program  to  the  recursive  call  to  mm, 
and  QrifAi  to  the  initial  call  to  vm  maps  the  array  of  ip  processes  to  an  isomorphic  array  of 
processors: 

%  mm(Xm,  Ym,Zm )  ♦— 

Zm  is  the  result  of  multiplying  the  matrix  Xtn  with  the  transposed  matrix  Ym. 
mm([  ],_,[]). 

mm([Xv|Xm],Ym,[Zv|Zm])  •- 

vm(Xv,Ym,Zv)Oright,  mm(Xm,Ym,Zm)Oforward. 

The  mapping  of  additional  process  structures  is  discussed  in  [163].  An  alternative  mapping  strategy 
ia  described  in  [174].  Here  show  an  enhanced  distributed  meta-interpreter  which  implements 
Turtle  program  mapping  J. 

We  assume  that  the  underlying  machine  is  a  torus-connected  mesh  of  processors  (a  virtual 
torus  can  be  mapped  on  a  two  dimensional  mesh  by  placing  four  virtual  processors  per  physical 
one).  The  interpreter  consists  of  a  torus  of  processor  processes.  We  sssume  that  these  processes 
are  mapped  to  the  underlying  processor*  using  the  ferae  program  shown  in  Section  7.2. 

Each  processor  process  has  four  outgoing  streams  to  its  neighbors.  Its  four  incoming 
streams  are  merged  into  one.  An  interpreted  process  has  a  beading,  and  possibly  also  a 
Turtle  program.  A  headed  process  is  represented  by  a  pair  (Gee/, Heeding)  where  Heeding 
is  one  of  ( aorU, ssaiA, euf , west ) .  To  a  beaded  procees  (G,fl)  a  Turtle  program  TP  may 
be  attached,  as  in  (G,B)OTP.  We  assume  the  process  in  each  processor  is  called  prsees- 
Jor(/»,[/a,  ToNorik,  ToSotik,  ToEtii,  To  Wesf]),  where  the  first  argument  it  the  merger  of  its  neigb- 
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bon’  outgoing  stream*,  and  ita  second  argument  is  the  list  of  its  five  outgoing  streams,  one  to  itself 
and  four  to  ita  neighbor*.  The  processor's  code  is  as  follows: 

proceasor([(Goal,Heading)|ln],Out)  •— 
reduce(Goal, Heading, In), 
p*ocea*or(In,Out), 
processor([(G,H)«TP|In],Out)  — 

route(G,H,TP,Out,Out0i 

proceaaor(In,Out' ) . 

It  receives  goals  on  its  input  stream.  If  a  goal  has  a  Turtle  program  it  routes  it  to  ita  appropriate 
output  stream.  Otherwise  it  executes  it  locally.  Its  execution  may  result  in  new  goals,  possibly 
with  Turtle  programs.  They  are  merged  into  its  input  stream,  and  treated  normally.  The  meta- 
interpreter  reduces  goals,  maintaining  their  heading,  and  when  it  encounter*  a  goal  with  a  Turtle 
program  it  sends  it  to  the  processor  for  routing. 

reduce(true,_ ,_). 
reduce((A,B),H,Out)  *— 

reduce(A,H,Out),  reduce(B,H,Out). 
reduce(goal(A),H,Out)  *— 

clanse(A,B),  reduce(B,H,Out). 
reduce(AOTP,H,Out)  «— 
write((A,H)OTP,Out). 

The  router  is  specified,  without  showing  its  code: 

route(Goal, Heading, TP, Out, OutO  «- 

Send  Goal  according  to  TP  and  Hctdiug  on  the  appropriate 
Osi  stream,  with  an  updated  beading  and  possibly  with  a  residual 
Turtle  program,  and  return  the  updated  streams  Out' 

The  torus  of  protestor  processes  can  be  mapped  on  an  underlying  torus  using  Turtle  programs; 
but  who  will  interpret  these  Turtle  programs?  Booting  an  initial  process  network  on  the  processor 
network  is  necessary,  and  can  be  done  using  standard  techniques.  One  solution  is  described  in 
[187]. 

In  this  scheme  the  underlying  parallel  implementation  of  the  language  does  not  have  to  sup¬ 
port  remote  process  spawning  in  addition  to  distributed  unification,  since  it  is  implemented  at 
the  language  level  by  standard  message  passing  between  meta-interpreter  (or  runtime  support) 
processes.  Another  mapping  notation  is  described  in  [174]. 

Mapping  with  dynamic  load-balancing 

Dynamic  load  balancing  requires  that  processors  off-load  work  when  they  are  too  busy,  and  request 
work  when  they  are  idle.  A  good  dynamic  load  balancing  algorithm  distribute*  work  evently  and 
with  little  overhead.  If  the  underlying  machine  has  a  notion  of  locality,  i.e.  communication  costa 
between  processors  am  not  uniform,  then  a  dynamic  load  balancing  algorithm  should  prefer  local 
distribution  of  work  over  global  one,  when  possible. 

We  show  here  a  simple  implementation  o (  dynamic  load  balancing  using  a  centralised  queue. 
The  scheme  can  be  enhanced  to  use  a  distributed  queue  [185],  and  thus  reduce  contention  and 
increase  locality. 

Assume  a  network  of  processors,  and  a  serf  mapping  command  which  places  the  process  in 
the  next  processor  in  some  processor  ordering.  A  distributed  meta-interpreter  performing  dynamic 
load  balancing  can  be  defined  a*  follows: 

proceasors(N,ToQ)  •- 
queue(ToQ), 
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processors'  (N  ,ToQ)Onext . 

processors' (0,_). 
processors' (N,ToQ)  ♦— 

N>0  |  N':=N-1, 
processor(ToQ), 
processors'^'  ,ToQ)Onext . 
processor(ToQ)  «— 
reduce(tnie,ToQ) . 

reduce(true,ToQ)  ♦— 

write(dequeue(  A)  ,ToQ  ,ToQ0 , 
reduce(A,ToQ/). 
reduce((A,B),ToQ)  4- 

write{enqueue(B),ToQ,To<y),  reduce(A,ToQ/). 
reduce(A.ToQ)  ♦— 

clauae(A,B),  reduce(B,ToQ). 

queue(In)  4— 

See  Section  7.3. 

Communication  can  be  reduced,  at  the  expense  of  slightly  slower  distribution  of  work,  by  placing 
a  buifer  in  each  processor.  The  buffer  forwards  requests  to  the  global  queue  only  if  it  overflows 
(has  too  many  enqueue  requests)  or  underflows  (cannot  satisfy  a  dequeue  request).  For  example, 
in  experiments  made  on  a  16  processor  computer  on  a  particular  application  a  buffer  sise  of  about 
10  was  found  optimal  [165]. 

Code  management 

Genera]  solutions  to  the  code  management  problem  are  also  applicable  to  concurrent  logic  pro¬ 
gramming  languages.  One  approach  to  the  problem  is  described  in  [187]. 


20.  Applications  of  Concurrent  Logic  Programming  Languages 

Since  their  beginning,  the  design  of  concurrent  logic  languages  was  closely  coupled  with  the  de¬ 
velopment  of  prototype  applications,  which  were  used  as  feedback  to  the  design  process.  The 
application  programs  were  those  which  testified  to  the  little  difference  between  fiat  and  non-flat 
languages  from  an  expressiveness  point  of  view.  The  systems  programs  were  those  which  stretched 
the  synchronization  capabilities  of  logic  languages  to  their  limits,  and  provided  examples  where 
the  power  of  atomic  test  unification  and  read-only  unification  shows  through. 

A  description  of  numerous  applications,  as  well  as  further  references,  can  be  found 
in  the  Concurrent  Prolog  book  [164].  The  book  reports  on  the  implementation  of  par¬ 
allel  and  distributed  algorithms,  systems  programming,  and  the  implementation  of  embed¬ 
ded  languages,  among  others.  Other  applications  of  concurrent  logic  languages  include 
[35,67,98,99,118,137,144,145,148,157,158,172,181,183].  Combined,  these  applications  witness  to 
the  generality  and  versatility  of  the  concurrent  logic  programming  approach. 
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PARTV.  CONCLUSIONS 

21.  Relation  to  Other  Languages  and  Computational  Models 

21.1  Prolog,  parallel  logic  languages  and  concurrent  constraint  languages 
Prolog 

Concurrent  logic  languages,  as  presently  defined,  are  not  an  alternative  to  Prolog.  They  are, 
in  a  sense,  lower  level  languages,  which  exhibit  their  strength  mainly  in  the  implementation  of 
parallel  algorithms,  distributed  systems,  reactive  systems,  and  in  systems  programming.  Hence 
the  question  of  the  integration  of  these  languages  with  higher-level  languages  in  general,  and  with 
Prolog  in  particular,  has  received  considerable  attention. 

One  of  the  initial  goals  in  the  design  of  Concurrent  Prolog  [160]  was  the  definition  of  a  language 
which  includes  Prolog  as  a  subset.  It  seemed  that  this  goal  was  not  realized  in  the  initial  design 
of  the  language,  and  hence  this  design  was  termed  “a  subset  of  Concurrent  Prolog”.  Later,  it 
was  found  out  that  an  Or-parallel  Prolog  interpreter  can  be  specified  easily  in  that  subset  (the 
interpreter  is  shown  in  Section  18)  and,  as  a  consequence,  that  the  original  design  did  achieve 
this  goal.  However,  the  move  to  flat  languages  opened  up  again  the  question  of  the  integration  of 
Prolog  and  concurrent  logic  languages. 

Two  solutions  were  discussed  in  Section  14.  One  is  to  provide  some  interface  between  two 
separate  languages:  some  form  of  Prolog,  and  some  concurrent  logic  language  [24,25].  Another  is  to 
embed  some  form  of  Prolog  into  a  concurrent  logic  language  [29,165,198,200].  A  third  solution  is  to 
provide  some  of  the  mechanism  of  concurrent  logic  language  via  extensions  to  Prolog,  such  as  freeze 
[12]  and  wait  declarations  [132].  The  problem  with  the  first  solution  is  that  it  really  does  not  address 
the  essence  of  the  problem,  namely  to  find  an  integrated  solution  in  which  the  various  strengths 
of  logic  programming  can  brought  to  bear.  It  is  applicable  to  any  two  programming  language, 
not  necessarily  logic  programming  ones.  The  problem  with  the  second  solution  is  performance. 
Techniques  for  efficient  implementation  of  concurrent  logic  languages  lag  one  step  behind  those 
of  sequential  Prolog,  and  there  are  claims  that  the  algorithms  employed  in  the  embedding  of 
Prolog  in  concurrent  logic  programming  are  not  feasible.  The  third  solution  is  largely  limited  to 
transformational  applications,  since  it  cannot  change  the  basic  fact  that  Prolog  is  not  a  reactive 
programming  language. 

CP(| ,{,&),  Andorra,  anH  Pandora 

The  synchronization  and  commitment  mechanisms  of  concurrent  logic  languages  are  useful  also  in 
non-reactive  applications.  This  motivated  a  different  line  of  research  —  the  design  of  non-reactive 
languages  that  attempt  to  supersede  Prolog  in  expressiveness  and  performance,  without  being 
rooted  in  its  sequential  execution  model. 

Saraswat  [150,153,157]  investigated  a  parallel  logic  language,  called  CP( [.),&),  that  incorpo¬ 
rates  both  don’t-care  and  don’t-know  nondeterminism,  and  synchronization  by  input  matching 
Although  an  efficient  implementation  on  top  of  sequential  Prolog  is  described  [Sad],  the  language 
seems  even  more  difficult  to  implement  “for  real”  than  the  non-flat  languages  discussed  in  Section 
18. 

Yang  and  Aiso  [209,210]  also  propose  a  language  with  don’t-care  and  don’t-know  nondetermin¬ 
ism,  called  P-Prolog,  but  use  a  different  synchronization  mechanism  —  the  determinacy  conditions 
described  in  Section  12  on  P- Prolog*. 

Recently,  an  elegant  integration  of  the  ideas  of  P- Prolog  and  of  Or-parallel  Prolog,  called  the 
Andorra  model,  was  proposed  by  D.H.D.  Warren  (personal  communication),  and  integrated  in  the 
Andorra  language  [72).  The  idea  is  as  follows:  reduce  in  parallel  determinate  goal  atoms  as  long  as 
possible  (And-parallelism).  When  no  determinate  atoms  remain,  choose  one  atom  for  an  Or-5p/t< 
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Create  two  or  more  subgoals,  one  for  each  clause  unifiable  with  the  chosen  atom,  and  continue  in 
parallel  reducing  the  resulting  independent  goals  (Or-parallelism).  Under  the  Andorra  model  pure 
logic  programs  may  exhibit  the  synchronization  behavior  of  concurrent  logic  programs,  yet  enjoy  a 
complete  proof  procedure.  If  in  an  Or-split  the  leftmost  atom  is  chosen,  Andorra  is  more  efficient 
(in  terms  of  the  number  of  reductions  required)  than  ordinary  Or-parallel  Prolog,  since  it  prunes 
the  search  space  better. 

The  ideas  in  the  Andorra  model  can  also  be  employed  in  an  implementation  of  the  flat  subset 
of  CP( Another  recent  proposal  along  these  lines  is  Pandora  [8]  —  a  parallel  logic  language 
incorporating  PARLOG-like  synchronization,  and  a  mechanism  for  specifying  which  coal  atom  to 
choose  for  an  Or-split. 

Concurrent  constraint  logic  programming 

The  framework  of  constraint  logic  programming  [92,31]  proved  in  recent  years  to  be  a  powerful 
generalization  of  logic  programming,  both  from  a  theoretical  and  from  a  practical  point  of  view. 
Maher  [124]  suggested  using  concepts  of  constraint  logic  programming  to  provide  a  logical  charac¬ 
terization  of  synchronization  in  concurrent  logic  programming.  The  conditions  for  the  success  of 
input  matching  and  guard  checking  of  a  goal  atom  A  with  a  clause  A*  *—  G  |  B  are  customarily 
defined  operationally,  as  in  this  paper.  Maher  showed  how  this  condition  can  be  specified  logically, 
as  the  requirement  that  the  accumulated  constraint  (corresponding  to  the  accumulated  substitu¬ 
tion  in  our  model)  entails  the  existential  constraint  (3)A=A'  A  G ,  where  the  existential  quantifier 
ranges  over  the  variables  local  to  the  clause.  Saraswat  [156,157]  developed  these  ideas  further. 
He  developed  a  framework  of  concurrent  constraint  logic  programming  in  which  a  computation 
progresses  by  agents  that  communicate  by  placing  constraints  in  a  global  store  and  synchronize 
by  checking  that  constraints  are  entailed  by  the  store.  Agents  correspond  to  goal  atoms,  plac¬ 
ing  constraints  correspond  to  unification,  and  checking  constraints  correspond  to  matching  and 
guard  checking  in  concurrent  logic  programming.  Employing  the  concepts  of  consistency  and 
entailment  between  partial  information  (i.e.  constraints),  Saraswat  was  able  to  provide  a  logical 
characterization  of  constraint-based  constructs  that  correspond  to  non-atomic  unification,  atomic 
test  unification,  read-only  unification,  test-and-set,  and  others.  Constraint  logic  programming  of¬ 
fer  a  logical  framework  for  dealing  with  domains  other  than  Herbrand  terms,  such  as  boolean, 
integer,  and  real  arithmetic.  Saraswat  showed  how  such  domains  and  others  can  be  incorporated 
in  concurrent  logic  languages  using  this  framework. 

The  initial  work  on  concurrent  constraint  logic  programming  is  very  promising,  and  one  may 
expect  that  it  will  have  as  much  theoretical  and  practical  impact  on  concurrent  logic  programming 
as  constraint  logic  programming  had  on  logic  programming. 

21.2  Distant  relatives  —  Delta  Prolog  and  Fleng 
Delta  Prolog 

Delta  Prolog  [140]  is  Prolog  augmented  with  CSP-like  communication  primitives.  Delta  Prolog  is 
different  from  the  other  languages  surveyed  in  two  respects.  First,  it  is  not  a  logic  programming 
language  in  the  sense  that  a  successful  computation  corresponds  to  a  proof  of  a  goal  statement,  and 
a  partial  computation  corresponds  to  proofs  of  a  conditional  statement.  Specifically,  the  role  of 
the  communication  primitives  of  Delta  Prolog  in  the  declarative  reading  of  programs  is  unclear.  In 
concurrent  logic  languages  the  synchronization  primitives  can  be  ignored  in  the  declarative  reading, 
since  they  affect  only  which  answer  substitution  is  found,  but  not  the  substitution  itself.  This  is 
not  the  case  in  Delta  Prolog.  Although  Delta  Prolog  can  be  given  axiomatic  semantics,  this  can  be 
done  for  any  programming  language,  not  only  for  a  logic  programming  one.  The  second  difference 
between  Delta  Prolog  and  the  other  languages  surveyed  is  that  Delta  Prolog  is  not  reactive,  since 
it  may  backtrack  on  communication. 

It  is  not  clear  yet  in  which  application  area  the  particular  features  of  Delta  Prolog  show  their 
advantage. 
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Fleng 

Fleng  [134,135,136]  is  a  simple  concurrent  programming  language  inspired  by  GHC  and  Kernel 
PARLOG  [23].  Its  syntax  uses  (guardless)  Horn  clauses.  Like  GHC,  it  uses  goal/clause  matching 
for  synchronisation,  and  its  unification  is  non-atomic.  Unlike  GHC,  unification,  as  well  as  any 
other  primitive,  reports  termination. 

Fleng  has  no  notion  of  failure.  Every  primitive  operation  terminates  and  reports  its  termina¬ 
tion  status.  For  example,  the  unification  primitive  *»ify(X,  Y,Rt9nH)y  attempts  to  unify  X  and 
Y.  If  it  succeeds  it  assigns  Rts*lt  the  value  true,  if  it  fails  it  assigns  the  value  /site. 

In  spite  of  its  appearance,  Fleng  is  not  a  logic  programming  language,  since  not  every  successful 
computation  corresponds  to  a  proof  of  the  goal  statement.  In  particular,  the  goal  %nify(a  «ffn ic), 
terminates  successfully,  but  apparently  so  does  «f»t/jr(e,ft,tr«e). 

The  insistence  that  a  successful  computation  should  correspond  to  a  proof  is  not  a  mere  nicety, 
and  Fleng  cannot  simply  drop  the  title  of  being  a  logic  programming  language  and  live  happily 
ever  after.  The  concept  of  failure  serves  the  fundamental  role  of  an  exception  mechanism  in  logic 
programming.  In  its  absence,  some  other  mechanism  must  be  developed.  As  is  evident  from  other 
languages  [73],  a  sound  exception  handling  mechanism  is  not  a  trivial  component  of  a  language, 
and  its  incorporation  in  Fleng  would  certainly  complicate  its  semantics.  Specifically,  if  Fleng ’s 
present  exception  handling  mechanism  (namely  the  Re$%H  variable  of  each  primitive)  cannot  be 
used  to  report  the  exception,  as  in  the  call  Mnify(t,h,tne)t  what  exception  should  be  raised?  The 
most  natural  one  is  to  fail  the  computation,  which  brings  us  back  to  square  one. . . 

If  failure  is  reinstated  in  Fleng,  then  it  becomes  similar  in  expressiveness  to  KL1,  since  it  can 
be  naturally  embedded  in  KL1  and  vice  versa. 

21.3  Dataflow  languages 

Concurrent  logic  languages  share  with  dataflow  languages  [1]  single- assignment  (or  write-once) 
variables  and  dataflow  synchronisation.  However,  this  is  mainly  a  similarity  in  spirit,  not  in 
implementation.  The  basic  operation  that  is  synchronized  by  dataflow  in  concurrent  logic  languages 
is  the  process  try.  It  corresponds  typically  to  several  tens,  up  to  several  hundreds,  of  conventional 
machine  instructions.  In  contrast,  the  synchronised  operation  iu  dataflow  models  corresponds 
typically  to  one  conventional  machine  instruction.  This  difference  explains  why  realisations  of 
concurrent  logic  languages  on  conventional  hardware  have  acceptable  synchronisation  overhead, 
whereas  dataflow  language  seem  to  necessitate  a  specialised  architecture. 

Other  differences  between  the  two  models  is  that  dataflow  languages  are  typically  determin¬ 
istic,  whereas  concurrent  logic  languages  are  not,  and  that  dataflow  languages  and  architectures 
are  typically  geared  for  scalar  operations,  whereas  logic  languages  operate  mainly  on  compound 
data-structures,  which  may  contain  logical  variables. 

21.4  Functional  languages 

Much  has  been  said  on  the  relation  between  functional  and  logic  languages  [36].  In  the  context 
of  concurrent  programming,  the  major  observation  is  that  functional  languages  a*e,  by  design  and 
ideology,  transformational,  rather  than  reactive.  Functional  programs  denote  time-independent 
functions  from  inputs  to  output,  and  notions  of  state,  synchronisation,  communication,  and  non¬ 
determinism  are  alien  to  them. 

Functional  programs  can  be  parallelized,  and  often  yield  efficient  parallel  algorithms.  However, 
without  major  extensions  [7,52,53,54,69,77],  which  seem  to  undermine  their  original  motivation  and 
'semantic  elegance’,  functional  programming  languages  cannot  be  used  for  the  specification  and 
implementation  of  reactive  systems. 

Concurrent  logic  languages,  on  the  other  hand,  have  explicit  notions  of  processes,  process  state, 
communication^  synchronization,  and  nondeterminism.  Furthermore,  processes  can  have  several 
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outputs,  and  inputs  and  outputs  of  processes  can  be  combined  into  arbitrary  process  networks. 
These,  combined  with  properties  of  tbs  logical  variable,  seem  to  be  the  source  of  their  power  as 
concurrent  languages;  all  are  absent  from  the  base  model  of  functional  languages. 

In  addition,  it  seems  that  then  are  usually  simple  translations  from  concurrent  functional 
languages  to  concurrent  logic  languages  [117].  Thus,  s  possible  architecture  of  a  parallel  computer 
system,  which  provides  both  styles  of  programming,  each  for  the  application  it  suits  best,  is  a  system 
in  which  the  base  language  is  a  concurrent  logic  programming  language,  which  implements  the 
underlying  operating  system  and  programming  environment,  and  higher-level  functional  languages 
are  implemented  by  translation  to  it.  Such  an  architecture  ia  proposed  by  [117]. 


The  origins  of  concurrent  logic  programming  languages  can  be  traced  back  to  the  work  of  Kahn 
and  MacQueen  [94],  which  offered  a  model  of  concurrency  baaed  on  deterministic  asynchronous 
processes  computing  relations  over  data  streams,  van  Emden  and  de  Lucena  [42]  were  intrigued 
by  this  notion,  sod  showed  how  one  can  use  logic  programs  to  specify  such  processes.  Clark  and 
Gregory  [20]  took  these  ideas  a  crucial  step  further  and,  influenced  by  the  notions  of  CSP  [80,87], 
introduced  synchronisation  and  committed-choice  nondeterminiam  into  logic  programs. 

Concurrent  logic  languages  are  similar  to  CSP  and  Occam  [91]  in  their  notion  of  processes, 
nondeterminism,  and  synchronisation  via  communication.  They  are  similar  to  Occam,  and  different 
from  CSP  and  Acton  [78],  in  that  processes  communicate  via  ‘ports’  (realised  by  logical  variables) 
rather  than  by  naming  the  destination  process  or  object. 

One  difference  between  CSP  and  Occam  on  the  one  hand  and  concurrent  logic  languages  on 
the  other  hand  are  the  type  of  communication  and  synchronisation  they  employ.  In  the  former 
communication  ia  synchronous;  in  the  latter  asynchronous.  In  the  former  a  communication  channel 
ia  necessarily  point-to-point.  In  the  latter  it  is,  in  the  general  case,  many-to-many.  We  find  the 
added  flexibility  of  the  communication  protocols  available  in  concurrent  logic  languages  over  those 
of  CSP  and  Occam  quite  apparent.  The  additional  overhead  entailed  by  this  added  flexibility  is 
yet  to  be  determined.  Presently,  it  is  not  clear  for  which  tasks  Occam-like  poinUto-point  syn¬ 
chronous  protocols  are  inherently  more  efficient  than  the  general  asynchronous  protocols  employed 
in  concurrent  logic  language,  and  vice  versa. 

Another  fundamental  difference  ia  that  CSP  and  Occam  can  operate  on  and  communicate 
only  “ground”  data,  whereas  the  ability  to  communicate  and  share  incomplete  data  structures,  i.e. 
data  structures  containing  logical  variables,  is  fundaments!  to  concurrent  logic  languages,  and  is 
their  main  source  of  expressive  power. 

Being  concrete  programming  languages,  concurrent  logic  languages  are  not  directly  comparable 
to  abatract  computation  models  such  ss  CCS.  However,  it  seems  that  if  one  abstracts  away  the 
detaila  of  the  data  domain  (i.e.  terms  and  unification),  and  concentrates  on  the  synchronization 
aspect  of  concurrent  logic  languages,  then  models  which  can  be  thought  of  as  the  asynchronous 
counterparts  of  CCS  [129]  emerge  [65]. 

Although  the  syntax  of  CSP  and  CCS  seems  superficially  different  from  that  of  concurrent 
logic  languages,  there  is  a  close  analogy  between  the  basic  operators  of  the  two  families,  shown  in 
Figure  8. 

21.6  Concurrent  object-oriented  programming 

The  underlying  operational  model  of  concurrent  logic  languages  resembles  that  of  concurrent 
object-oriented  models,  such  as  Acton  [78],  in  that  both  consist  of  s  dynamic  collection  of  light¬ 
weight  processes,  computing  by  performing  local  computations  and  exchanging  messages.  There 
are,  however,  several  apparent  differences. 

First,  Actor  objects,  like  CSP  processes,  address  each  other  by  name,  and  not  via  channels. 


Figure  S:  Analogy  between  CCS/CSP  operator*  and  guarded  Horn  clauses 


The  advantage  of  channela  over  object  names  ta  modularity  and  abstraction;  this  had  led  Occam’s 
designers  to  depart  from  CSP  in  this  respect.  It  is  easier  to  connect  one  process  network  to  another 
by  assigning  output  channels  of  one  to  input  channels  of  the  other,  than  by  informing  one  the  names 
or  mail  addresses  of  the  appropriate  processes  in  the  other.  Channels  are  also  more  abstract,  since 
knowing  a  channel  does  not  imply  knowing  who  receives  or  sends  messages  on  that  channel.  A 
process  can  have  several  input  channela,  which  provide  different  access  modes  to  its  local  data;  this 
feature  can  be  the  basis  of  a  capability  system.  Several  processes  may  listen  on  the  same  channel, 
each  handling  a  different  set  of  messages,  or  handling  a  different  aspect  of  a  message. 

If  one  is  able  to  pass  channels  in  massages,  a*  in  logic  languages,  than  channels  have  an¬ 
other,  perhaps  more  fundamental,  advantage  over  name-based  addressing.  Process  names  in  mes¬ 
sages,  like  incomplete  messages,  can  be  used  for  network  reconfiguration.  However,  this  is  only 
one  particular  application  of  incomplete  messages.  The  use  of  incomplete  messages  in  the  back- 
communication  protocol,  in  dialogues,  in  the  bounded-buffer  protocol,  in  the  duplex-stream  proto¬ 
col,  and  in  others  is  bssed  on  the  ability  to  aHocatsepsmn  unication  chan  nek  oa  the  fly,  and  on  the 
fact  that  the  channel  implicitly  embeds  son*  context  information,  which  is  used  in  the  protocol. 
There  is  no  natural  way  to  achieve  these  effect*  in  name-based  addressing. 

The  drawback  of  concurrent  logic  languages,  compared  to  Actor-like  languages,  is  not  their 
underlying  operational  model,  but  rather  the  verbose  syntax  required  for  expressing  object-oriented 
programs.  The  description  of  an  object  with  one  input  channel  and  aome  state  variablea  in  a 
concurrent  logic  language  baa  the  typical  form; 

p((Measage|ln},. .  title  varieblea. . .)  — 

. . .  handle  Meaaage,  update  state  variablea. . ., 
p(In,. .  .new  state  variables  . .). 

Furthermore,  when  several  proceaaea  share  the  aame  output  channel  (“talk  to  the  same  object”), 
then  aome  protocol,  euch  aa  the  spawning  of  a  merge  network,  need  to  be  followed.  Thia  is  in 
contrast  to  Actor-iike  languages,  in  which  state  variables  ate  assumed  not  to  change  unleaa  a 
change  ia  stated  explicitly,  and  explicit  merger*  need  not  be  created  in  front  of  receiving  objecta, 
since  they  are  assumed  implicitly. 

Another  bookkeeping  .service  provided  automatically  by  object-oriented  languages  is  object 
deallocation;  when  there  are  no  more  references  to  an  object,  it  it  deallocated,  and  its  storage  is 
reclaimed.  In  concurrent  logic  languages,  unreferenced  data-etrueturee  are  reclaimed  by  garbage 
collection,  but  the  conditions  for  process  termination  must  be  specified  explicitly,  by  one  or  more 
unit  clauses.  Sometimes  tbs  burden  of  doing  eo  manually  should  better  be  avoided. 

A  mechanism  for  detecting  that  a  variable  is  referenced  only  by  one  proceaa  [17]  can  be  used 
for  garbage  collecting  proceeeee:  A  process  that  detect*  that  it  is  the  only  one  referencing  its  input 
stream  may  perform  some  cleanup  operations  (e.g.  close  it*  output  streams  or  unify  it*  segment  of 
a  short-circuit)  and  terminate  (K.  Kahn,  personal  communication).  Although  the  pragmatics  of 
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thij  mechanism  is  quite  well  understood,  its  logical  semantics  still  needs  to  be  worked  out. 

The  question  of  the  proper  integration  of  inheritance  in  a  concurrent  object-oriented  framework 
is  still  open.  Delegation  was  suggested  as  a  mechanism  which  is  more  suitable  to  a  concurrent 
framework.  As  discussed  in  Section  7.6,  objects  which  delegate  incomprehensible  messages  can  be 
specified  in  concurrent  logic  languages  by  augmenting  the  process  with  additional  output  stream, 
and  adding  a  delegating  clause  which  uses  the  otherwise  construct.  This  mechanism,  however,  is 
also  quite  verbose. 

These  observations  have  lead  to  the  design  of  new  object-oriented  languages,  such  aa  Vulcan 
[95],  POLKA,  and  POOL  [35].  These  languages  attempt  to  enjoy  the  best  of  both  worlds.  They 
adopt  the  channel  concept  of  concurrent  logic  languages,  but  do  not  require  explicit  repetition 
of  state  variables,  explicit  mergers,  or  explicit  delegation  mechanism.  Another  important  design 
consideration  for  these  language  was  that  their  implementation  be  in  terms  of  natural  and  efficient 
translations  to  concurrent  logic  languages.  This  would  allow  the  exploitation  of  implementations  of 
such  languages,  as  well  as  support  integration  between  applications  that  an  best  described  by  an 
object-oriented  language,  and  applications  that  enjoy  the  full  power  of  concurrent  logic  languages. 

Consider  the  standard  bank  account  example.  In  the  Vulcan  language,  a  process  with  the 
desired  behavior  is  specified  by  the  following  program: 

daas(account,  [Balance=0,  Names  “No  Name  Given*,  Errors,. . .]). 

account  ::  deposit] Amount)  — * 

new  Balance  :=  Balance  -I-  Amount. 

account  ::  balance] Balance). 

account  ::  withdraw] Amount)  — ► 

Balance  >  Amount 

iiTroe  new  Balance  ;s  Balance  -  Amount 

ifFalse  Errors  :  Ovesdrawn]Name,  Balance,  Amount, . . .). 

A  more  conservative  ssluti—jfr  tbs' same  direction  was  to  devise  a  new  ‘surface  syntax*  for 
concurrent  lope  programs,  rather  than  a  completely  new  languages.  The  surface  syntax,  called 
logic  profrenu  with  implicit  vorioUc $  [96],  allows  specifying  only  what  has  changed  in  the  process’s 
state  during  a  transition,  rather  than  the  entire  old  and  new  states  explicitly,  as  required  by  plain 
logic  programs.  In  addition,  it  has  a  special  notation  to  support  stream  communication,  and  array 
operations.  For  example,  the  bank  account  in  FCP(])  with  implicit  variables  would  be  specified  as 
follows: 

procedure  aecount(In)+(Balance=0,Name=  ‘No  Name  Given’ .Errors,. . .). 

account  — 

In  ?  deposit]  Amount)  | 

Balance1  :=  Balance  +  Amount, 
account. 

account  •— 

In  ?  balance] Balance)  | 
account. 

account  •— 

In  7  withdraw(Amount), 

Balance  >  Amount  | 

Balance'  :=  Balance  -  Amount, 
account. 

account  — 

In  ?  withdraw] Amount), 


Balance  <  Amount  | 

Error*  !  Overdrewn(Name, Balance,  Amount,. . .), 
account. 

The  variable  X'  specifies  the  new  value  of  the  proceaa  argument  X.  The  atream  notation  M 
9  X$  ia  a  shorthand  for  the  input  matching  Xt=[M\X»’\,  and  Jf  !  Jfa  is  a  shorthand  for  the  same 
unification. 

Unlike  in  Vulcan  this  notation  is  employed  only  for  stream,  rather  then  channel  [194]  commu¬ 
nication.  An  extension  of  this  approach  to  incorporate  channels  a*  an  abstract  data  type  ia  being 
investigated. 

21.7  Linda 

Linda  [13,2]  is  a  set  of  primitives  that  operate  concurrently  on  a  multiset  of  tuples,  called  a  Tuple 
Space.  Tuples  in  a  Tuple  Space  are  accessed  aasociatively  using  a  degenerate  form  of  unification 
between  tuples  and  tuple  templates.  The  basic  operations  are  osi(T)  (insert  a  tuple  T  to  the  Tuple 
Space)  in(T)  (delete  a  tuple  matching  T,  instantiating  variables  in  T;  block  if  a  matching  tuple  is 
not  available)  and  r i(T)  (find  a  tuple  matching  T,  instantiate  variables  in  T).  A  fourth  primitive 
,  e«a/,  support  process  forking.  Augmenting  a  conventional  sequential  programming  language 
with  these  Linda  primitives  results  in  a  concurrent  programming  language  in  which  processes 
communicate  and  synchronise  via  the  Tuple  Space. 

A  comparison  of  Linda  and  concurrent  logic  program  is  given  in  [13].  A  critique  of  this 
comparison,  which  demonstrates  an  embedding  of  Linda’s  primitives  in  a  variant  of  FCP(:)  is 
given  in  [166], 

21.8  Nondeterministic  transition  systems  and  UNITY 

Non  deterministic  transition  systems  are  a  natural  method  for  specifying  concurrent  systems.  In¬ 
deed,  we  have  given  the  semantics  of  concurrent  logic  programming  languages  using  nondeter¬ 
ministic  transition  systems.  Recently,  a  notation  was  proposed  for  specifying  concurrency  called 
UNITY  [16],  UNITY  is  based  on  unbounded  iterative  nondeterministic  transitions. 

Concurrent  logic  languages  share  with  UNITY  the  goal  of  being  a  foundation  for  a  general 
purpose  concurrent  programming  language,  the  belief  that  the  execution  model  of  such  a  language 
should  be  abstract,  rather  then  being  tied  with  a  concrete  architecture,  and  the  conviction  that 
nondeterminism  is  an  essential  component  in  such  a  model.  Another  point  in  common  between 
UNITY  and  the  stronger  concurrent  logic  languages  ia  the  site  of  the  atomic  operation:  both  the 
simultaneous  assignment  of  UNITY  and  atomic  unification  in  languages  such  as  FCP(:)  involve 
atomic  transactions  which  read  from  and  write  to  several  variables. 

One  difference  between  UNITY  and  concurrent  logic  languages  is  the  notion  of  a  process.  A 
UNITY  program  has  one  global  state,  and  transitions  operating  on  it,  possibly  concurrently;  it 
does  not  have  an  explicit  notion  of  a  process.  Concurrent  logic  programs  have  a  natural  notion 
of  a  process.  However,  this  difference  is  only  apparent.  The  notion  of  a  process  in  concurrent 
logic  programs  is  in  the  eyes  of  the  beholder  —  it  is  not  an  inherent  part  of  transition  system  of 
concurrent  logic  programs,  Similarly,  one  can  often  identify  “processes'  in  UNITY  programs,  if 
one  so  desires. 

Another  difference  between  UNITY  and  concurrent  logic  languages  is  the  notion  of  termina- 
I  tion.  Concurrent  logic  programs  terminate  by  explicit  instructions.  UNITY  programs  terminate 

implicitly,  by  reaching  a  fixpoint.  One  implication  of  this  decision  is  that  there  is  no  distinction 
between  successful  termination  and  deadlock.  We  feel  that  this  difference  is  mostly  a  matter  of 
I  definition:  one  can  define  a  different  model  of  concurrent  logic  programs  in  which  termination  is  by 

fixpoint;  similarly,  one  can  define  “NTTY* ,  which  is  like  UNITY  except  that  there  are  explicit  ter¬ 
mination  conditions.  To  our  opinion,  explicit  termination  is  preferable  both  from  the  programmer’s 


and  from  the  implementor’*  point  of  view  in  both  model*. 

We  find  the  fundamental  difference  between  UNITY  and  concurrent  logic  language*  in  the 
notion  of  a  variable.  In  UNITY,  variable*  are  mutable;  therefore  a  transition  mint  exclude  other 
transitions  from  writing  on  variable*  it  read*  from,  and  from  accessing  variables  it  writes  to.  In  con¬ 
current  logic  languages,  variable*  are  single-assignment,  therefore  no  mutual  exclusion  mechanisms 
are  required  when  reading  a  variable.  The  effect  of  mutable  unshared  variables  can  be  achieved 
nonetheless  in  concurrent  logic  languages,  as  explained  in  Section  7,  using  iterative  processes. 

It  seems  that  this  fundamental  difference  is  the  source  of  another  difference  between  UNITY 
and  concurrent  logic  language*,  namely  their  attitude  to  architectures.  Although  both  are  architec¬ 
ture  independent,  the  gap  between  the  general  UNITY  model  and  concrete  architectures,  such  ss  a 
non-shared  memory  parallel  computer,  a  sufficiently  large  that  the  authors  of  UNITY  suggest  that 
special  sublanguages  should  be  tailored  for  particular  parallel  architectures.  In  contrast,  authors 
of  concurrent  logic  languages  believe  their  languages  are  suitable  for  all  architectures.  The  bur¬ 
den  of  matching  the  application  to  the  architecture  resides  solely  with  the  algorithm  designer  and 
programmer.  The  belief,  which  is  backed  by  the  implementation  efforts,  is  that  concurrent  logic 
languages  are  suitable  for  a  wide  range  of  architectures,  including  synchronous  and  asynchronous 
shared-memory  computers,  and  tightly  and  loosely  coupled  non-shared  memory  computers.  The 
difference  between  these  architectures  is  not  necessarily  in  the  concurrent  logic  language  suitable  for 
them,  but  rather  in  the  tradeoffs  in  communication  and  computation  they  offer,  which  determine 
which  algorithms  will  better  match  a  particular  architecture. 

This  difference  is  not  a  coincidence.  The  single  assignment  property  of  logic  variables  means 
that  even  in  a  language  with  atomic  test  unification,  locking  of  variables  is  very  rarely  necessary. 
Specifically,  it  is  necessary  almost  only  when  the  atomicity  of  unification  is  actually  exploited  to 
achieve  some  synchronisation  task.  For  example,  in  simple  benchmarks  of  the  parallel  implemen¬ 
tation  of  FCP(?)  on  the  iPSC  hypercube,  more  than  95%  of  the  message  trafic  was  associated 
with  reading  remote  values  (which  does  not  require  locking  because  of  the  single  assignment  prop¬ 
erty),  and  leas  than  5%  with  locking  remote  variables  [191).  This  is  achieved  without  any  special 
compilation  or  program  analysis  techniques.  In  UNITY,  on  the  other  hand,  in  the  absence  of 
additional  information,  every  transition  which  accesses  more  than  one  variable  requires  locking  all 
variables  accessed.  Therefore  special  sublanguages,  which  are  structured  to  mimic  the  underlying 
architecture,  have  to  be  employed  to  make  the  model  realistic. 

On  a  methodological  level,  there  are  other  differences  between  the  approach  of  UNITY  and  that 
of  concurrent  logic  languages.  UNITY  doe*  not  attempt  to  address  questions  of  meta-programming 
and  systems  programming,  or,  more  generally,  how  would  a  parallel  computer  system,  whose  base 
language  is  UNITY,  be  constructed.  This  question  has  been  fundamental  to  concurrent  logic 
programming  from  it*  beginning. 


22.  Conclusion 

This  survey  attempted  to  convey  the  soundness,  breadth,  and  potential  of  the  logic  programming 

approach  to  concurrency.  Progress  in  the  following  can  foster  fully  realising  this  potential: 

#  Provide  competitive  implementations  of  concurrent  logic  languages  for  sequential,  parallel  and 
distributed  computers. 

s  Develop  simpler  semantic  foundations  for  concurrent  logic  languages. 

s  Exploit  the  simplicity  of  these  langusgee  to  provide  advanced  program  development  environ¬ 
ment*  and  tools. 

s  Exploit  the  simplicity  of  these  languages  to  provide  advanced  program  analysis,  transforma¬ 
tion,  and  optimisation  techniques,  to  aid  in  their  efficient  implementation. 

•  Further  develop  programming  methodologies  and  technique*  for  these  languages. 

e  Enhance  concurrent  logic  programming  by  incorporating  idea*  and  methods  from  constraint 


logic  programming. 

•  Further  explore  technique*  for  embedding  higher-level  language*,  and  design  higher-level  lan¬ 
guage*  (*uch  a*  parallel  constraint  programming  language*)  especially  suitable  for  embedding 
in  concurrent  logic  language*. 
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